Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I was going through the Talend Documentation and i came across "Set Parallelization".
I went throught the documentation on this "Set Parallelization" but i really didnt understand much...
Can anyone pls help me in this.... ???
Also when i click on "Set Parallelization" i get this as "No Need for this Job"...may i know why???
also i see that there are 4 components relating to this tcollector,tpartitioner,tdepartitioner,trecollector.... are they similar to steps for implementing partitioning....????
in the row settings....can anyone explain about basic and advanced settings,breakpoint and parallelization???
how do i utilize parallelization tab in row settings????
how do i configure them???
There is also a demo scenario that i have tried to implement -
the scenario is load input with the count of the records to the output..
i have encountered 2 things while doing this scenario while returning the count of the records i find a warning on top of taggregaterow which says... the partitioning keys should be same with its partitioning connection.
and once i remove the count function and when i run the job it executes successfully but i dont see the output...
Can anyone pls help ???
Thanks in advance,
Ankit
tfixedflowinput
Job view
row1 schema settings
row1 advanced settings
row1 parallelization settings
taggregaterow settings
Problems Tab in Preferences
End_result in Tlogrow
Breakpoint settings
These components (i.e. tPartitioner, etc.) let you break up a large record set into chunks so you can process it in parallel. Basically, the components handle the "bookkeeping" involved in splitting up the records; that's why the workflow is somewhat complex.
What problem are you trying to solve with these components?
The 2 errors you're getting are type conversion errors; you can probably fix these by updating your schema.
hi @DVSCHWAB,
pls consider the warnings in the problems tab because..those are for different jobs and also the name of the job is files30....
and i basically want a clear explanation on this????
Pls help
THanks in advance
Ankit
I am not convinced of the parallelization within a flow. It cause a lot of complexity you never will be aware of and there are a lot of use cases in which this kind of parallelization never can work. Your example is such a kind of.
The aggregation cannot be done in parallel without using the key column as partition - but where to setup this?
I always recommend doing mostly nothing in parallel in jobs except calling other jobs via the iterate flow parallel setting.
Hi @lli,
You say that parallelization never works for any job... or only certain jobs.....????
If it works for certain jobs then how must i design my job where i can perform parallelization ???
is there any pre-requisite while enabling "Set Parallelization" in the job level.....
Pls help...
THanks in advance...
Ankit