Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

unique values extraction

Hi All,

need a help regarding extraction of unique values of a multiple columns individually in a table by using Talend/SQL.

EXAMPLE:

INPUT:

ID NAME SALARY AGE
1 MOHAN 5000 27
2 AJAY 2700 26
3 SAI 5000 29
4 RAMA 3000 27
5 MOHAN 4500 29

 

EXPECTED OUTPUT:

ID NAME SALARY AGE
1 MOHAN 5000 27
2 AJAY 2700 26
3 SAI 3000 29
4 RAMA 4500  
5      

 

Thanks in advance.

regards;

Bhargav

Labels (3)
2 Replies
Anonymous
Not applicable
Author

Hi,

 

    You can either use tAggregaterow or tUniquerow to select the unique values where you need to pass only relevant fields as input to these components (you can use a tMap to select the right columns).

 

    One thing I would like to say is that your input and output record do not have any correlation. For example, you were having two input records as shown below.

 

MOHAN	5000	27
MOHAN   4500    29

     But you have taken the unique value as 

 

MOHAN 5000 27

 

     You might be joining records of two persons with name Mohan with different age to a single output record. Since the output record details are a combination of the earlier two records it may result in a totally new imaginary person.

 

     So please double check how you would like to aggregate the input records.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

tnewbie
Creator II
Creator II

In addition to all the valid inferences from Nikhil, another approach i would suggest is:

1) Sort your data based on the column you want to check on (In your example Name)

2) Check the subsequent rows of data if they match the column value, if they match then ignore else carry forward. With this approach you would only process the first row of data for that column. However you are have to double ensure that your sort criteria is correct and valid

Eg.   Input Data

       SNo.     Name      Age

      100       Mohan       23

      101       Mike          45

      102       Mohan       32

 

Sorted Data

      SNo.     Name      Age

      100       Mohan       23

      102       Mohan       32

      101       Mike          45

Process Detail

when the current row is Sno=100 you have nothing to compare so that will go through, when the current row is 102 you find that for the previous record Name= the name of the current record, so you can ignore that record, when you get to the next record Name=Mike which is different from Mohan so that will go through

 

This approach would greatly help in performance as well, because if the incoming data volume is huge, aggregator will quickly turn out to be performance hindrance. This is a age old technique where you compare rows on the fly... Hope that helps