Record Count at ID level

Anonymous · ‎2019-01-23

Hi

I'm trying to achieve an equivalent of a COUNT(*) GROUP BY 2 fields in SQL within a Talend flow but at the moment we are reading a table and writing out a sort/uniq'd output to a MySQL table, then having a second subjob that reads the output from the first step and counts the number of rows per ID.

Any idea whether it would be possible to run this in a single subjob flow or whether we should keep these jobs separate and run the count in raw SQL?

Thanks

Dave

David_Beaty · ‎2019-01-23

Hi,

Split out the data you want to get the GROUP BY on with a tReplicate (so it gets its own feed to perform the aggregation on) and then either tSortRow/tAggregateSortedRow or tAggregateRow, depending on data volumes.

Talend Data Integration

v7.x