Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
ankit7359
Creator II
Creator II

Partitioning in Talend?????????????

Hi,

I was going through the Talend Documentation and i came across "Set Parallelization".

I went throught the documentation on this "Set Parallelization" but i really didnt understand much...

Can anyone pls help me in this.... ???

Also when i click on "Set Parallelization" i get this as "No Need for this Job"...may i know why???

also i see that there are 4 components relating to this tcollector,tpartitioner,tdepartitioner,trecollector.... are they similar to steps for implementing partitioning....????

in the row settings....can anyone explain about basic and advanced settings,breakpoint and parallelization???

how do i utilize parallelization tab in row settings????

how do i configure them???

There is also a demo scenario that i have tried to implement - 

the scenario is load input with the count of the records to the output.. 

i have encountered 2 things while doing this scenario while returning the count of the records i find a warning on top of taggregaterow which says... the partitioning keys should be same with its partitioning connection.

and once i remove the count function and when i run the job it executes successfully but i dont see the output...

Can anyone pls help ???

Thanks in advance,

Ankit

0683p000009M1WG.pngtfixedflowinput0683p000009M10w.pngJob view0683p000009M1WL.pngrow1 schema settings0683p000009M1PG.pngrow1 advanced settings0683p000009M1T8.pngrow1 parallelization settings0683p000009M1Q5.pngtaggregaterow settings0683p000009M1Wp.pngProblems Tab in Preferences0683p000009M1Wu.pngEnd_result in Tlogrow0683p000009M1G4.pngBreakpoint settings

Labels (2)
4 Replies
Anonymous
Not applicable

These components (i.e. tPartitioner, etc.) let you break up a large record set into chunks so you can process it in parallel. Basically, the components handle the "bookkeeping" involved in splitting up the records; that's why the workflow is somewhat complex.

 

What problem are you trying to solve with these components?

 

The 2 errors you're getting are type conversion errors; you can probably fix these by updating your schema.

ankit7359
Creator II
Creator II
Author

hi @DVSCHWAB,

pls consider the warnings in the problems tab because..those are for different jobs and also the name of the job is files30....

and i basically want a clear explanation on this????

Pls help 

THanks in advance

Ankit

Anonymous
Not applicable

I am not convinced of the parallelization within a flow. It cause a lot of complexity you never will be aware of and there are a lot of use cases in which this kind of parallelization never can work. Your example is such a kind of.

The aggregation cannot be done in parallel without using the key column as partition - but where to setup this?

I always recommend doing mostly nothing in parallel in jobs except calling other jobs via the iterate flow parallel setting. 

ankit7359
Creator II
Creator II
Author

Hi @lli,

You say that parallelization never works for any job... or only certain jobs.....????

If it works for certain jobs then how must i design my job where i can perform parallelization ???

is there any pre-requisite while enabling "Set Parallelization" in the job level.....

Pls help...

THanks in advance...

Ankit