Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a job similar to the one below, with two inputs and a filter on each input. I want to output the number of rows from the filters, into a file output that looks something like:
row2|0 rows
row3|100 rows
row5|5 rows
row6|5 rows
The file layout doesn't matter too much, I can tweak that as necessary. I just need the values.
If I have a separate tFlowMeterCatcher that outputs to a CSV file, only the second set of values is output. I believe the first set IS output, but is then overwritten regardless of the "append to file" setting. In any case, I do not want to append to file, because I want the stats to be refreshed entirely each time I run the job.
I have also tried to output the results via the Stats&Logs option in the Job tab, but this also only outputs the results from the latest subjob, having overwritten the results from the initial one.
How can I make all four stats appear in the same file? Is it possible to have more than one tFlowMeterCatcher, and specify which tFlowMeter(s) it refers to?
I have managed to resolve it, but I am surprised at the complexity of my solution, and I would love to know if there is a better way.
Since tFlowMeterCatcher overwrites the output from one subjob with the second subjob, my aim was to merge the two jobs into one. The only way I found to do this was to use a tMap immediately after each tRowGenerator, and create a new field which identified the source. This is just a simple string field which can contain "Source1" or "Source2" (or anything more meaningful).
I then used a tUnite to bring the two tMaps together, immediately followed by a tReplicate. Each output from tReplicate connected to a tFilter which filtered on the source field, and from then I used the original tFilter which gives me the values I need, within the same job.
Incidentally, this initially caused an error where the size of the Java code was too large, so I split it up into two sections using a tHashInput and tHashOutput.
To me this whole thing seems unnecessarily complex, but I could find no other way to resolve it. Does anyone have a better, more efficient solution?
I have managed to resolve it, but I am surprised at the complexity of my solution, and I would love to know if there is a better way.
Since tFlowMeterCatcher overwrites the output from one subjob with the second subjob, my aim was to merge the two jobs into one. The only way I found to do this was to use a tMap immediately after each tRowGenerator, and create a new field which identified the source. This is just a simple string field which can contain "Source1" or "Source2" (or anything more meaningful).
I then used a tUnite to bring the two tMaps together, immediately followed by a tReplicate. Each output from tReplicate connected to a tFilter which filtered on the source field, and from then I used the original tFilter which gives me the values I need, within the same job.
Incidentally, this initially caused an error where the size of the Java code was too large, so I split it up into two sections using a tHashInput and tHashOutput.
To me this whole thing seems unnecessarily complex, but I could find no other way to resolve it. Does anyone have a better, more efficient solution?
An image of my solution, with a small refinement of using a tMap to filter on the sources, instead of a tReplicate and multiple tFilters.