Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Enable Parallel Execute is Greyed Out

Hi,
I am using Talend Enterprise Data Integration Version: 5.5.1 Build id: r118616 and wanted to process a delimited file in parallel. However, the enable parallel execution for the component tFileInputDelimited is greyed out. Is there anything that I need to do to so that I can enable this option?
Regards.
Allan
Labels (2)
15 Replies
Anonymous
Not applicable
Author

Sorry for the delayed reply. Was in transit from US to Singapore 0683p000009MACn.png. Would really love to get rbaldwin's input on this. And just to give you a better idea of what we are working on, we are currently working on a dwh and some of the files are just too large (20 - 30GB text files on daily basis) that reading and processing them in a single thread is not going to be an option.
Anonymous
Not applicable
Author

Hi,
The KB article: TalendHelpCenter:How to automatically enable parallelization of data flows for better performance
"Set Parallelization "feature explained in this section is available only on the condition that you have subscribed to one of the Talend Platform solutions or Big Data solutions V5.3.1 or later.

Can you get this feature in your Talend Enterprise Data Integration Version: 5.5.1 Build id: r118616?
Could you please show us your current job design screenshot?
Best regards
Sabrina
Anonymous
Not applicable
Author

Hi Sabrina,
Please see below.

( www.talendforge.org/forum/img/members/243775/jobflow.png)
Unfortunately, the parallelization option is not available for my version. So I guess the next thing to ask which specific product should we buy to get this option. Would also appreciate if you can get someone to send a price matrix for what you are offering. I have contacted Drew James from your UK office but I am not getting any answers.
Thanks and warm regards.
Allan
Anonymous
Not applicable
Author

Hi,
For enterprise subscription product price matrix, you have to send an email to talend sale team.
There is no any reference about that in our side.
What's your job rate(rows/s)? Does it take a long time for you? Have you tried to break it into several subjobs with multi threade?
Best regards
Sabrina
0683p000009MBRV.png
Anonymous
Not applicable
Author

Hi Sabrina,
We are currently at the stage of selecting the tools that we are going to use and building PoCs for each of those tools. As such the ETL routines are quite simple at the moment and the throughput are quite acceptable. However, getting several 15GB files go through the actual process of going through a series of transformations (i.e. data cleanup, lookups, joins, etc.) is going to be very different. And to give everyone a better idea of what we are doing, we have a requirement to be able to do 250,000 inserts into a database per second. While that requirement did not require us to do a lot of complex transformation that would have required us to use an ETL tool, it can definitely change in the future. As such we are looking for an ETL tool that is able to provide the following:
1. Ability to read a file in parallel. A better explanation of this can be found at ( doc.cloveretl.com/documentation/UserGuide/index.jsp?topic=/com.cloveretl.gui.docs/docs/parallelreader.html) (optional)
2. Ability to partition data and process them in parallel. A "proper" MPP support would be good but SMP would be fine as well. (must have)
3. In-memory lookup, in-memory aggregation. (must have)
While jholman suggestion of splitting the files in parallel and then processing them in parallel will definitely work, please do take note that splitting a file into several pieces takes time as well. And I would not want to have to keep on manually design parallel handling for each file that we are going to process.
The problem is when I do a search, I get to see Talend performing exactly what I am looking for, but for some reason I am not able to replicate using my version. Let's take the article help.talend.com/pages/viewpage.action?pageId=3986800#Raa92445, search for "Iterate connection settings". As you can see in the screenshot, there is an "Enable parallel execution" and you are able to define the number of parallel executions that you want. You are also able to see in the second screenshot that each execution did around 70k rows each. But when I look at my tLoop, I don't even see the "Enable parallel execution" checkbox.
So my question is, does Talend support the above features that we are looking for? if yes, can you provide a link to the relevant documentation?
Thanks and would really appreciate clarification in this regards.
Allan
Anonymous
Not applicable
Author

Some additional materials. Please take note that I am not in anyway leaning or have a reason to a specific product at the moment. But hopefully the below link would clarify some of what I am requiring.
https://cloveretl.wordpress.com/2009/10/26/parallelreader-versus-competitors/
https://cloveretl.wordpress.com/2009/11/11/parallelreader-versus-competitors-part-2/
Thanks and warm regards.
Allan