Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to execute on every 100 rows of data?

Hey,
I am pretty sure talend should be able to do this task relatively easily, but I am not sure the best way to go about it.
I have 100,000 rows of data, but an API I am calling can only take 100 rows of data per API call.
I would like to execute an API call on 100 rows each time until I have looped through the full 100,000 row data set.
Any advice/recommended components on going about this is much appreciated.
Thanks,
Brian
Labels (2)
28 Replies
Anonymous
Not applicable
Author

Thanks @Shong, will have a look now.
@sanvaibhav, thanks for taking the time to do this, below is a section from the job I have.

0683p000009MEcZ.png
Anonymous
Not applicable
Author

@Shong, I have been looking into your method in more detail, a potential problem could be the dataset changing during runtime, I would then need to add another layer of logic to prevent the missing / duplication of some records. It seems like over kill when i can get the dataset in one statement and should be able to process later, what are your thoughts? Is there no way i can pull down a full set of data and do this?
Anonymous
Not applicable
Author

Hi AshWhitear
If you are not sure how many total records it has in the database, you can select the total number of records in a DB input component and store this value to a context variable or global variable, this variable will be used in the ' To' field of tLoop component.
Best regards
Shong
Anonymous
Not applicable
Author

Hi @Shong, thanks for this, I think its almost working now, just a quick one about what youve said above, how would I store that much data inside a context var or global var? What datatype would I use and how would I get all the data into that one var?
Ive tried using tHashInput and Output but upon clearing the object seems to be removed and I get an error saying the Hash isnt initialised after the first loop, if I could store this data in a context of global variable it should work perfectly!
Thanks, Ash.
Anonymous
Not applicable
Author

Hi
 how would I store that much data inside a context var or global var? What datatype would I use and how would I get all the data into that one var?

This variable stores the total number of records, not the data. Define a context variable with int/Integer type. For example:
tMssqlInput--main--tJavaRow
In tMssqlInput, select the total number of records:
"select count(*) as nb_line from tableName"
In tJavaRow:
context.to=input_row.nb_line
Best regards
Shong
Anonymous
Not applicable
Author

I am also facing same issue, did you archived this issue.if yes, please let me know the solution

Anonymous
Not applicable
Author

I am also facing same issue, did you got solution for this issue.if yes, please let me know the solution. Same design here
Anonymous
Not applicable
Author

Hi All,

I am also facing same issue, did you got solution for this issue.if yes, please let me know the solution
en666
Contributor II
Contributor II

You do not have the described behaviour in native Talend. But it can be done with some components. Just wanted to share my solution:

Use case: need to send 122,726 rows to a PowerBI table via API, which has a limitation of 10,000 rows per call, resulting then in 13 calls; notice that the last call will only upload 2,726 rows.

0683p000009M8SX.png

 

The trick is just to create the correct exit condition for a tLoop that we will then use to split the 122K flow and shoot the call every 10K rows.

Some Context variable are used to support the process, including a variable for the 10,000 split. This implementation indeed will work with an arbitrary number of chunked rows.

0683p000009M8NY.png

 

The first tJava just sets the valid condition to allow the first tLoop iteration.

0683p000009M8fa.png

 

From now on, the tLoop will check against a Context variable that carries the number of processed rows during an iteration. The idea is that we expect this number to be 10,000 every time; if, at a certain point, the number of processed rows is lower than 10,000, then it is time for the tLoop to exit.

0683p000009M8lX.png

 

 

Every iteration needs to re-set the following:
Rows counter, Starting and Ending Point for the rows to be considered during the iteration.

0683p000009M8Vt.png

 

Extract the 10,000 rows we are interested in, during the iteration

0683p000009M8o6.png

 

Increment the rows counter by 1 at every row, until reaching 10,000. If it does not reach the 10,000, it means that the rows are finished and the tLoop will exit.

0683p000009M8oB.png

 

Done! 13 iterations: 12 with 10,000 rows and 1 with 2,726.

 

Notes:

  1. This subjob can’t be run in parallel, as it is. You’ll need to tweak it a little bit by your own.
  2. 122,726 rows come from an Excel file, so tHash-ing them for performance improvement during the numerous readings.
  3. In the tLoop it can be noticed that Integer.compare() is used in place of ==. This to avoid the JAVA caveat around non-primitive comparisons.
  4. Inside the tLoop the variable is ‘n’ and not the default ‘I’ because it risks to conflicts with the ‘i’ variables of Talend Components, the tExcelFileInput for example.