Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a requirement like around 2000 file names are stored in the MSSQL table. I want to read those files in batches like 500 rows and then process those files. After completion of that job, again pick the next 500 files and follow the same process till the last file.
How to achieve the same?
Thank You in Advance,
Sonali
Hi,
We can create the job on parent & child basis:
Parent job: In this we can store the context variable value. In this the tweak is we have initially created one file and store the context variable value as "0".
Child job: In this, we can increment the value and on the basis of that we can filter the data. like in your case the job will fetch first 500 records and on next run it will take next 500 records.
At the end of the child job we are storing the seq1(in this example is 2) which is working as the context variable value for the next time.
In my example I had took the sequence of 2
Output will be:
First run
second run
Please let me know if you need more clarification.
Regards,
Akash
Hi,
You cannot take the data and split them as batches directly using Talend components. You would have to employ custom components in between to process them as multiple batches.
The ideal way to process will be to do a single read from MSSQL database to fetch all the 2000 records in one go and add a tflowtoiterate component to process these files one after another. It will make sure that all your files are processed and at the same time you are doing a single database fetch instead of multiple calls.
But if you would like to process 500 records in parallel, then it will become complex use case for a standard job where you will have to employ tparallelize and other custom code components.
A better idea will be to merge all the files (assuming the schema are same) and process using Bigdata Spark jobs which will be quite fast.
I assume your use case is simple. So try with option 1 and if its taking time, go for Bigdata job options.
Please mark the topic as resolved if the answer has helped you. Kudos are also welcome 🙂
Warm Regards,
Nikhil Thampi
Hi,
We can create the job on parent & child basis:
Parent job: In this we can store the context variable value. In this the tweak is we have initially created one file and store the context variable value as "0".
Child job: In this, we can increment the value and on the basis of that we can filter the data. like in your case the job will fetch first 500 records and on next run it will take next 500 records.
At the end of the child job we are storing the seq1(in this example is 2) which is working as the context variable value for the next time.
In my example I had took the sequence of 2
Output will be:
First run
second run
Please let me know if you need more clarification.
Regards,
Akash
It worked for me.
Thank You so much Akash.
Hi sonali,
I was wondering the same, Can you please explain how you done this
Thanks in Advance.
Regards
Fazil