Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
need a help regarding extraction of unique values of a multiple columns individually in a table by using Talend/SQL.
EXAMPLE:
INPUT:
ID | NAME | SALARY | AGE |
1 | MOHAN | 5000 | 27 |
2 | AJAY | 2700 | 26 |
3 | SAI | 5000 | 29 |
4 | RAMA | 3000 | 27 |
5 | MOHAN | 4500 | 29 |
EXPECTED OUTPUT:
ID | NAME | SALARY | AGE |
1 | MOHAN | 5000 | 27 |
2 | AJAY | 2700 | 26 |
3 | SAI | 3000 | 29 |
4 | RAMA | 4500 | |
5 |
Thanks in advance.
regards;
Bhargav
Hi,
You can either use tAggregaterow or tUniquerow to select the unique values where you need to pass only relevant fields as input to these components (you can use a tMap to select the right columns).
One thing I would like to say is that your input and output record do not have any correlation. For example, you were having two input records as shown below.
MOHAN 5000 27 MOHAN 4500 29
But you have taken the unique value as
MOHAN | 5000 | 27 |
You might be joining records of two persons with name Mohan with different age to a single output record. Since the output record details are a combination of the earlier two records it may result in a totally new imaginary person.
So please double check how you would like to aggregate the input records.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved
In addition to all the valid inferences from Nikhil, another approach i would suggest is:
1) Sort your data based on the column you want to check on (In your example Name)
2) Check the subsequent rows of data if they match the column value, if they match then ignore else carry forward. With this approach you would only process the first row of data for that column. However you are have to double ensure that your sort criteria is correct and valid
Eg. Input Data
SNo. Name Age
100 Mohan 23
101 Mike 45
102 Mohan 32
Sorted Data
SNo. Name Age
100 Mohan 23
102 Mohan 32
101 Mike 45
Process Detail
when the current row is Sno=100 you have nothing to compare so that will go through, when the current row is 102 you find that for the previous record Name= the name of the current record, so you can ignore that record, when you get to the next record Name=Mike which is different from Mohan so that will go through
This approach would greatly help in performance as well, because if the incoming data volume is huge, aggregator will quickly turn out to be performance hindrance. This is a age old technique where you compare rows on the fly... Hope that helps