How to parse an input round robin into multiple ou... - Qlik Community

Anonymous · ‎2017-06-07

I am trying to load an input file into Redshift and I want to split the file round robin before loading it into Redshift to make use of the computational power of multiple slices in my cluster. How do I split an input into n number of outputs in a round robin fashion using Talend?

Ex:

Input:

id name

1 Jon

2 Anne

3 Cole

4 Zack

5 Ellen

Output:

Main1

1 Jon

4 Zack

Main2

2 Anne

5 Ellen

Main 3

3 Cole

cterenzi · ‎2017-06-07

You can create three tMap outputs with the condition: rowX.id % 3 == 0
...1
...2
And send each output to a separate file

Anonymous · ‎2017-06-07

Thank you for the reply. I thought about doing that, but I actually need 6 outputs (I put down 3 in my question to simplify the problem). So with this method rowX.id % 3 = 0 and rowX.id % 2 = 0 and rowX.id % 6 = 0 when the id is divisible by 6. I can't think of a simple filter to be able to split it 6 ways.

cterenzi · ‎2017-06-07

You can create six outputs and change the expression to mod 6.

Alternately, I think you can set a row limit on tFileDelimited, and it will split the file into chunks of that size. To get a consistent number of files, you'd need to get a record count and divide that by the number of files you want. I can't test right now, but I'd assume it would use the sort order of the data flow, so that wouldn't get you a round robin of IDs unless you added the modulo expression as a new column and then sorted by that (and secondarily by the id).

How to parse an input round robin into multiple outputs

Talend Data Integration

v6.x