Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
New to talend. I have a requirement to identify duplicates and handle them by logging all the duplicate records in a separate output file. Any help is appreciated.
Ex:
Input: FILE1 | ||
STATE | DATE | VALUE |
AK | 4/24/2019 | 100 |
AK | 4/24/2019 | 200 |
AZ | 4/24/2019 | 300 |
CA | 4/24/2019 | 100 |
CA | 4/24/2019 | 150 |
Output Table: STG1 | ||||
STATE | DATE | VALUE | REC_STATUS | ERR_MSG |
AK | 4/24/2019 | 100 | Success | |
AK | 4/24/2019 | 200 | Failed | Duplicate in same file |
AZ | 4/24/2019 | 300 | Success | |
CA | 4/24/2019 | 100 | Success | |
CA | 4/24/2019 | 150 | Failed | Duplicate in same file |
Output File: Error_Log | ||||
STATE | DATE | VALUE | REC_STATUS | ERR_MSG |
AK | 4/24/2019 | 200 | Failed | Duplicate in same file |
CA | 4/24/2019 | 150 | Failed | Duplicate in same file |
@A_San ,check the below solution.
You can use the tUniqRow component.
tUniqRow allows either duplicates or uniques. But in my requirement, I have to allow both but with different status. Uniques should have record status as 'S' for success and duplicates should have record status as 'F' for failed and the error message will say something like "Duplicate record".
@A_San ,check the below solution.
Hi @A_San
You can solve the above scenario with two ternary operator condition in the two tMap.
The overall job will be:-
First sort the "State" column in ascending order. Then in the first tMap give condition as
row2.State.equals(Var.prev)?"Failed":"Success" and store the state value in other variable.
Give another condition in the second tMap
Then you will get the required output
If your query is answered, please mark the topic as resolved.
Thanks,
Aarif
Thank you @manodwhb and @AarifAkhtar
I was trying another approach with tUniqrow. Could you please take a look at the screenshot and let me know if my approach is good w.r.t efficiency/performance.
@A_San ,you approach is ok.
@manodwhb I have couple of more situations with respect to duplicate handling.
I may have duplicate records across multiple files which I need to capture. I cannot use tUnite because I will not be able to get the respective file name in the tmap to load to my database.
And the other situation is that i may have the record already loaded in the target database and if the same record comes again, I need to be able to capture that as well.
Any help would be appreciated.