Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Talend Architecture and Distributed Data Processing

Hi all,
I'm relatively new to talend (just playing with 2.0) but have experience with Sunopsis, PowerCenter and other ETL like tools.
Having read the documentation and watched the forums I'm still confused to the distributed nature of the talend architecture. I can see how I can design a job within the talend studio and either execute it interactively from the UI or export the job for distribution on a collection/grid of machines but I can't see any support for automatically distributing the data amongst machines.
On a grid of say 10 machines, with 50M records I would obviously like to split the source data into 10 batches of 5M each, distribute amongst the 10 machines and recombine into a target database/file. From what I can tell that splitting is manual.
Is there any existing, or planned, support for automatically distributing a job over N machines. Similarly is there any planned support for splitting a job over N processors/cores without first writing data to a staging area/file ?
Don't take the above as a criticism, I think talend is great - I'm just just checking out what it can do and which part of my toolbox I hang it. If talend doesn't currently support the above has anyone found a suitable workaround or solution that worked for them ?
cheers,
DIGuy
Labels (3)
2 Replies
Anonymous
Not applicable
Author

1  What is the difference between list and list(object) operations in tAggregateRow?
Anonymous
Not applicable
Author

What is the difference between byte and byte[] variable types?