Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
A_San
Contributor
Contributor

Duplicate records handling

New to talend. I have a requirement to identify duplicates and handle them by logging all the duplicate records in a separate output file. Any help is appreciated.

Ex:

Input: FILE1  
STATEDATEVALUE
AK4/24/2019100
AK4/24/2019200
AZ4/24/2019300
CA4/24/2019100
CA4/24/2019150

 

Output Table: STG1    
STATEDATEVALUEREC_STATUSERR_MSG
AK4/24/2019100Success 
AK4/24/2019200FailedDuplicate in same file
AZ4/24/2019300Success 
CA4/24/2019100Success 
CA4/24/2019150FailedDuplicate in same file

 

Output File: Error_Log    
STATEDATEVALUEREC_STATUSERR_MSG
AK4/24/2019200FailedDuplicate in same file
CA4/24/2019150FailedDuplicate in same file
Labels (2)
1 Solution

Accepted Solutions
manodwhb
Champion II
Champion II

@A_San ,check0683p000009M7Ho.png0683p000009M7Ht.png0683p000009M7GD.png0683p000009M7Hy.png0683p000009M7Ev.png the below solution.

View solution in original post

7 Replies
Xenoflex
Contributor III
Contributor III

You can use the tUniqRow component.

A_San
Contributor
Contributor
Author

tUniqRow allows either duplicates or uniques. But in my requirement, I have to allow both but with different status. Uniques should have record status as 'S' for success and duplicates should have record status as 'F' for failed and the error message will say something like "Duplicate record".

manodwhb
Champion II
Champion II

@A_San ,check0683p000009M7Ho.png0683p000009M7Ht.png0683p000009M7GD.png0683p000009M7Hy.png0683p000009M7Ev.png the below solution.

Anonymous
Not applicable

Hi @A_San

 

You can solve the above scenario with two ternary operator condition in the two tMap.

The overall job will be:-

0683p000009M7JB.png

First sort the "State" column in ascending order. Then in the first tMap give condition as 

row2.State.equals(Var.prev)?"Failed":"Success" and store the state value in other variable. 

0683p000009M7JG.png

 

Give another condition in the second tMap

0683p000009M7Eo.png

Then you will get the required output 

0683p000009M7BL.png

 

If your query is answered, please mark the topic as resolved.

 

Thanks,

Aarif

A_San
Contributor
Contributor
Author

Thank you @manodwhb and @AarifAkhtar

I was trying another approach with tUniqrow. Could you please take a look at the screenshot and let me know if my approach is good w.r.t efficiency/performance.

 


SS16.PNG
manodwhb
Champion II
Champion II

@A_San ,you approach is ok.

A_San
Contributor
Contributor
Author

@manodwhb I have couple of more situations with respect to duplicate handling.

 

I may have duplicate records across multiple files which I need to capture. I cannot use tUnite because I will not be able to get the respective file name in the tmap to load to my database.

And the other situation is that i may have the record already loaded in the target database and if the same record comes again, I need to be able to capture that as well.

 

Any help would be appreciated.