Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to remove duplicates from two excel tables?

I am fairly new to Talend. My use case is,I have two excel tables with employee data. The columns are name, email, street and phone number. I need to find out the common employees between both the tables based on phone number or street and put the data into a third excel sheet. I can do the above using a tuniqRow and Tunite. However, the phone number could be    of the format , +1 8x9-201-1xx5 in one table and in the second table, it could be 8x9-201-1xx5. the street field could be Main street on one table and Main st in another. How can I deal with that? Should I use a tmap, tregex? and how should I filter out the data? Thank you very much! 

Labels (1)
2 Replies
TRF
Champion II
Champion II

Hi,

 

You should have some search around tFuzzyMatch component which is here to help for deduplication using Levenshtein, Metaphone or Double Metaphone algorythm.

Probably it could help you to solve this kind of use case.

 

Let us know.

Anonymous
Not applicable
Author

Hi TRF, thanks for the reply. I checked out the tFuzzyMatch component and I was able to remove some duplicates using Levenshtein. However, my use case is slightly different. If I have two excel tables with employee details and the phone number is provided as (234)-123-4567 in one table and 2341234567 in another tables, I need a component which can compare both tables and decide both of them are same employee based on a regex or some other kind of logic. Is there anything like that available in Talend? Thanks