Solved: How to Processing the rows in batch - Qlik Community

sonalijagtap · ‎2018-08-16

Hi,

I have a requirement like around 2000 file names are stored in the MSSQL table. I want to read those files in batches like 500 rows and then process those files. After completion of that job, again pick the next 500 files and follow the same process till the last file.

How to achieve the same?

Thank You in Advance,

Sonali

Anonymous · ‎2018-08-18

Hi,

We can create the job on parent & child basis:

Parent job: In this we can store the context variable value. In this the tweak is we have initially created one file and store the context variable value as "0".

Child job: In this, we can increment the value and on the basis of that we can filter the data. like in your case the job will fetch first 500 records and on next run it will take next 500 records.

At the end of the child job we are storing the seq1(in this example is 2) which is working as the context variable value for the next time.

In my example I had took the sequence of 2

Output will be:

First runsecond run

Please let me know if you need more clarification.

Regards,

Akash

View solution in original post

Anonymous · ‎2018-08-16

Hi,

You cannot take the data and split them as batches directly using Talend components. You would have to employ custom components in between to process them as multiple batches.

The ideal way to process will be to do a single read from MSSQL database to fetch all the 2000 records in one go and add a tflowtoiterate component to process these files one after another. It will make sure that all your files are processed and at the same time you are doing a single database fetch instead of multiple calls.

But if you would like to process 500 records in parallel, then it will become complex use case for a standard job where you will have to employ tparallelize and other custom code components.

A better idea will be to merge all the files (assuming the schema are same) and process using Bigdata Spark jobs which will be quite fast.

I assume your use case is simple. So try with option 1 and if its taking time, go for Bigdata job options.

Please mark the topic as resolved if the answer has helped you. Kudos are also welcome 🙂

Warm Regards,

Nikhil Thampi

sonalijagtap · ‎2018-08-16

Hi Nikhil,
current job is like read all the files and then run another job(trunJob) for transformation. According to your solution,
tSQLInput->tflowToIterate-> trunjob

but in this flow how can I define the condition like 500 files in one go?
My question is for 500 files, trunjob should run once and for next 500 files it has to run again.

Thank You,
Sonali Jagtap

Anonymous · ‎2018-08-18

Hi,

We can create the job on parent & child basis:

Parent job: In this we can store the context variable value. In this the tweak is we have initially created one file and store the context variable value as "0".

Child job: In this, we can increment the value and on the basis of that we can filter the data. like in your case the job will fetch first 500 records and on next run it will take next 500 records.

At the end of the child job we are storing the seq1(in this example is 2) which is working as the context variable value for the next time.

In my example I had took the sequence of 2

Output will be:

First runsecond run

Please let me know if you need more clarification.

Regards,

Akash

sonalijagtap · ‎2018-08-21

It worked for me.

Thank You so much Akash.

MdFazil · ‎2023-10-18

Hi sonali,

I was wondering the same, Can you please explain how you done this

Thanks in Advance.

Regards

Fazil

How to Processing the rows in batch

Talend Data Integration

v7.x