Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
anna_t1
Contributor III
Contributor III

Multithread with tflowtoiterate in TOS to process set of rows

Hello,

I have a cild job which is responsible for making API calls. I have a maximum of requests/min, and that it's being taken care of in my tJava component. It seems to be workinf fine for just 1 thread.

But what if I want to distribute the MAX rows for 4 threads with the limitation of total MAX requests/min (and not per thread)? Does this even makes sense since I'm writing all to a single CSV file (requirement to bulk load in my Snowflake db) or do I have to sync something after threads' execution?

 

I've searched and spent many hours on this, but I can't seem to find a solution that fits n threads to process max/n request in one minute total each time.

 

Thank you.

0693p00000BVmHNAA1.png

Labels (3)
5 Replies
anna_t1
Contributor III
Contributor III
Author

I also read that Parallel execution and/or multithreaded execution is not handled the same way in TOS (which is my case, TOS 7.2 to be exact) and Enterprise version. Is this still true?

 

Anonymous
Not applicable

Hello,

Usually "use parallel execution" option is supported on the t<DB>output component which is used to perform high-speed data processing, by treating multiple data flows simultaneously. Note that this feature depends on the database or the application ability to handle multiple inserts in parallel as well as the number of CPU affected. And there is also tParallelize component which allows you to synchronize the execution of a subjob with the execution of other subjobs in your main Job. This feature is available in talend subscription solution not open source.

Best regards

Sabrina

anna_t1
Contributor III
Contributor III
Author

I appreciate your answer, but I'm sing TOS 7.2 so tParallelize component  is not an option for me. Also, I don't need bulkload in this case (I already handle that elsewhere).

What I'm trying to understand is the "Enable parallel execution" "Number of threads" in the tFlowtoIterateComponent, which is reading row by row. I don't understand if it processes x rows by n threads, since I got the impression that it was replicating the row n times. Can you please explain how and what for is used that parallel execution in tflowToIterate?

My idea was to read 100 rows, distribute the processing for n threads ...but it doesn't seem to work that way with that option ....

Anonymous
Not applicable

Hello,

Maybe you can use an iteration link with parallel execution: with this link, it's possible to specify with a context variable the number of thread to launch.

For example, tFileInputDelimited (contains the list of files to compute) --> tFlowToIterate --> output

On click on iterate link between tFlowToIterate and output, you can "enable parallel execution".

Let us know if it is OK with you.

Best regards

Sabrina

anna_t1
Contributor III
Contributor III
Author

I do exactly that in TOS. My problem is: is there a way to garantee or distribute the rows per thread as I want? Let's say I have 10 rows I want to process per minute, by 5 threads. Each thread processes 2, and at the end, can't be less than 1 minute, otherwise it has to wait ...is it possible to distribute and synchronize like these in TOS ,, where there's no tParallelize available?

Furthermore: Is there a way to print the nr of the thread running there on the iterate link?