Solved: Re: Collating row counts in multiple subjobs. - Qlik Community

miteshkhatri80 · ‎2018-03-15

I have a job similar to the one below, with two inputs and a filter on each input. I want to output the number of rows from the filters, into a file output that looks something like:

row2|0 rows

row3|100 rows

row5|5 rows

row6|5 rows

The file layout doesn't matter too much, I can tweak that as necessary. I just need the values.

If I have a separate tFlowMeterCatcher that outputs to a CSV file, only the second set of values is output. I believe the first set IS output, but is then overwritten regardless of the "append to file" setting. In any case, I do not want to append to file, because I want the stats to be refreshed entirely each time I run the job.

I have also tried to output the results via the Stats&Logs option in the Job tab, but this also only outputs the results from the latest subjob, having overwritten the results from the initial one.

How can I make all four stats appear in the same file? Is it possible to have more than one tFlowMeterCatcher, and specify which tFlowMeter(s) it refers to?

miteshkhatri80 · ‎2018-03-16

I have managed to resolve it, but I am surprised at the complexity of my solution, and I would love to know if there is a better way.

Since tFlowMeterCatcher overwrites the output from one subjob with the second subjob, my aim was to merge the two jobs into one. The only way I found to do this was to use a tMap immediately after each tRowGenerator, and create a new field which identified the source. This is just a simple string field which can contain "Source1" or "Source2" (or anything more meaningful).

I then used a tUnite to bring the two tMaps together, immediately followed by a tReplicate. Each output from tReplicate connected to a tFilter which filtered on the source field, and from then I used the original tFilter which gives me the values I need, within the same job.

Incidentally, this initially caused an error where the size of the Java code was too large, so I split it up into two sections using a tHashInput and tHashOutput.

To me this whole thing seems unnecessarily complex, but I could find no other way to resolve it. Does anyone have a better, more efficient solution?

View solution in original post

miteshkhatri80 · ‎2018-03-16

I have managed to resolve it, but I am surprised at the complexity of my solution, and I would love to know if there is a better way.

Since tFlowMeterCatcher overwrites the output from one subjob with the second subjob, my aim was to merge the two jobs into one. The only way I found to do this was to use a tMap immediately after each tRowGenerator, and create a new field which identified the source. This is just a simple string field which can contain "Source1" or "Source2" (or anything more meaningful).

I then used a tUnite to bring the two tMaps together, immediately followed by a tReplicate. Each output from tReplicate connected to a tFilter which filtered on the source field, and from then I used the original tFilter which gives me the values I need, within the same job.

Incidentally, this initially caused an error where the size of the Java code was too large, so I split it up into two sections using a tHashInput and tHashOutput.

To me this whole thing seems unnecessarily complex, but I could find no other way to resolve it. Does anyone have a better, more efficient solution?

miteshkhatri80 · ‎2018-03-16

An image of my solution, with a small refinement of using a tMap to filter on the sources, instead of a tReplicate and multiple tFilters.

Collating row counts in multiple subjobs.

Other

Talend Data Integration

v6.x