Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
ankit7359
Creator II
Creator II

Partitioning in Talend?????????????

Hi,

I was going through the Talend Documentation and i came across "Set Parallelization".

I went throught the documentation on this "Set Parallelization" but i really didnt understand much...

Can anyone pls help me in this.... ???

Also when i click on "Set Parallelization" i get this as "No Need for this Job"...may i know why???

also i see that there are 4 components relating to this tcollector,tpartitioner,tdepartitioner,trecollector.... are they similar to steps for implementing partitioning....????

in the row settings....can anyone explain about basic and advanced settings,breakpoint and parallelization???

how do i utilize parallelization tab in row settings????

how do i configure them???

There is also a demo scenario that i have tried to implement - 

the scenario is load input with the count of the records to the output.. 

i have encountered 2 things while doing this scenario while returning the count of the records i find a warning on top of taggregaterow which says... the partitioning keys should be same with its partitioning connection.

and once i remove the count function and when i run the job it executes successfully but i dont see the output...

Can anyone pls help ???

Thanks in advance,

Ankit

0683p000009M1WG.pngtfixedflowinput0683p000009M10w.pngJob view0683p000009M1WL.pngrow1 schema settings0683p000009M1PG.pngrow1 advanced settings0683p000009M1T8.pngrow1 parallelization settings0683p000009M1Q5.pngtaggregaterow settings0683p000009M1Wp.pngProblems Tab in Preferences0683p000009M1Wu.pngEnd_result in Tlogrow0683p000009M1G4.pngBreakpoint settings

Labels (2)
4 Replies
Anonymous
Not applicable

These components (i.e. tPartitioner, etc.) let you break up a large record set into chunks so you can process it in parallel. Basically, the components handle the "bookkeeping" involved in splitting up the records; that's why the workflow is somewhat complex.

 

What problem are you trying to solve with these components?

 

The 2 errors you're getting are type conversion errors; you can probably fix these by updating your schema.

ankit7359
Creator II
Creator II
Author

hi @DVSCHWAB,

pls consider the warnings in the problems tab because..those are for different jobs and also the name of the job is files30....

and i basically want a clear explanation on this????

Pls help 

THanks in advance

Ankit

Anonymous
Not applicable

I am not convinced of the parallelization within a flow. It cause a lot of complexity you never will be aware of and there are a lot of use cases in which this kind of parallelization never can work. Your example is such a kind of.

The aggregation cannot be done in parallel without using the key column as partition - but where to setup this?

I always recommend doing mostly nothing in parallel in jobs except calling other jobs via the iterate flow parallel setting. 

ankit7359
Creator II
Creator II
Author

Hi @lli,

You say that parallelization never works for any job... or only certain jobs.....????

If it works for certain jobs then how must i design my job where i can perform parallelization ???

is there any pre-requisite while enabling "Set Parallelization" in the job level.....

Pls help...

THanks in advance...

Ankit