Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have 100 csv files with the same schema.
One each file, I need to focus on one particular field and get the frequency for all values in that field.
Here is an example, if on File001 the values for that field (Column1) is like below:
Column1
A
A
A
B
B
C
We want to have the output :
Column1 Frequency
A 50%
B 33%
C 17%
The same process will run through all 100 files and eventually I will gather/unite the value with top frequency from each file. The the final output will have 100 rows (one row for each file).
Thanks!
I would try a job similar to this: Where the tAggregateRow counts Column1 and groups by Column1, then the frequency code would look like this:
Note: I haven't tested this because I don't have a similar collection of files to use at the moment and I don't quite have the time to generate them.
I would try a job similar to this: Where the tAggregateRow counts Column1 and groups by Column1, then the frequency code would look like this:
Note: I haven't tested this because I don't have a similar collection of files to use at the moment and I don't quite have the time to generate them.
I created a similar job. I used tMap to add the calculated field (frequency) instead of using tJavaRow and then tSortRow (desc) and then tSampleRow to get the top row.
Thanks!