Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a CSV file and I need to get the max value for each row between c1, c2, c3, c4 and c5.
The maximum value must be stored in the a new column as the following example:
c1 | c2 | c3 | c4 | c5 | Max_Value |
87.2 | 83.6 | 81.4 | 83 | 83 | 87.2 |
85.9 | 84 | 82 | 84 | 83.3 | 87.2 |
84.9 | 82.7 | 86 | 83 | 82.3 | 86 |
85.6 | 84.6 | 82.7 | 84.6 | 84.3 | 86.2 |
85.2 | 83.6 | 82 | 83.3 | 83 | 85.6 |
86.8 | 84.3 | 82.7 | 84.3 | 83.6 | 86.8 |
86.2 | 84 | 82.3 | 84 | 87 | 87 |
88.1 | 84.6 | 82.3 | 84.3 | 83.6 | 88.1 |
Anyone can give me a hint or a way to achieve this ?
Kind regards,
Mohammad
You can achieve this pretty easily using a bit of Java.
Assume you have a tMap and your input columns are....
row1.newColumn
row1.newColumn1
row1.newColumn2
row1.newColumn3
row1.newColumn4
If you want to find the largest of these numbers for every row, create an output called "largestValue" and assign this expression....
Math.max(row1.newColumn,Math.max(row1.newColumn1,Math.max(row1.newColumn2,Math.max(row1.newColumn3,row1.newColumn4))))
What is happening here is that a Java Math method called "max" is being used to assess two numbers, the first and second number. In this example I am calling this method inside calls to the same method repeatedly so that the resulting value is passed on as an input to another call.
You can achieve this pretty easily using a bit of Java.
Assume you have a tMap and your input columns are....
row1.newColumn
row1.newColumn1
row1.newColumn2
row1.newColumn3
row1.newColumn4
If you want to find the largest of these numbers for every row, create an output called "largestValue" and assign this expression....
Math.max(row1.newColumn,Math.max(row1.newColumn1,Math.max(row1.newColumn2,Math.max(row1.newColumn3,row1.newColumn4))))
What is happening here is that a Java Math method called "max" is being used to assess two numbers, the first and second number. In this example I am calling this method inside calls to the same method repeatedly so that the resulting value is passed on as an input to another call.
Hi,
@rhall got the best solution as for a limited and fixed number of value.
If you have much more values per row or a variable number of value, you need a more complete solution:
- tFixedInputRow is used as the input - if you input is a CSV file, access it using tFileInputRaw to ignore the number of fields at this stage
- tMap add a sequence number for each row (as a key)
- tReplicate replicates the flow if you don't want to access again the original data source
- tNormalize generates a key/value flow (one row for each value - here 41 rows cause I add 99 on 1rst record)
- tAggregate search forn the max value for each key
- tHashOutput memorize the result
OnSubjobOk
- tHashInput_1 the original data unchanged but with the unique key
- tHashInput_2 to retrieve the max value for each key
- tMap to join both input using the key and add the max value as a new field
- tLogRow to print the result on the console
Starting job test at 15:01 15/11/2017. [statistics] connecting to socket on port 3981 [statistics] connected .--+------------------------+----. | tLogRow_15 | |=-+------------------------+---=| |id|line |max | |=-+------------------------+---=| |1 |87.2 83.6 81.4 83 83 99 |99 | |2 |85.9 84 82 84 83.3 |85.9| |3 |84.9 82.7 86 83 82.3 |86 | |4 |85.6 84.6 82.7 84.6 84.3|85.6| |5 |85.2 83.6 82 83.3 83 |85.2| |6 |86.8 84.3 82.7 84.3 83.6|86.8| |7 |86.2 84 82.3 84 87 |87 | |8 |88.1 84.6 82.3 84.3 83.6|88.1| '--+------------------------+----' [statistics] disconnected Job test ended at 15:01 15/11/2017. [exit code=0]
As you can see I've added the value 99 on the first record and it is find as the max value for the row 1.
Hope this helps.
Thanks guys for your answers I appreciate it.