Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to Join data masked

Hi, 

I want to mask sensitive data in my DB with Talend, some data to be masked are key fields, so I need to use them to join, i used tDataMasking, but applying the same function to the same key in two different tables, the output is different. How can I fix it? Is there a particular function that I have to choose in tDataMasking for this use (doing join with masked data)?

Labels (2)
4 Replies
dprot
Contributor II
Contributor II

Hi Mark,

That's a very good question. At the beginning, most of the tDataMasking functions were purely random (i.e. we did not care about what is in the input). We added in 6.3 some functions for SSN (called "Generate unique xxx SSN number" where xxx can be Chinese, French, German, Indian, Japanese, UK, US) that are able to do exactly what you want, if you have a SSN as an input. We may do it for other types (like credit cards). In what functions are you interested in ?

Damien

Sebastiao_Qlik
Employee
Employee

If you don't use SSN, there is still an approximate way to do it:

first, store all your unique Ids in a file
then use the "Replace by consistent items from input list (or file)" function to read from this file.

Anonymous
Not applicable
Author

Thank you for your answer,

I want to do the join between the keys of a table, this keys could be a string of integers or letters or both. So I tried to mask these keys with a "replace all" "replace all digits" "replace all letters" and other functions but the join isn't done correctly because if I apply the same function to the same key in two different tables is masked differently.

 

Sebastiao_Qlik
Employee
Employee

Hi Mark,

 

This cannot work. The replacement done by these functions are purely random.

 

As I said in my previous answer:

first, store all your unique keys in a file (ideally, add more keys to this file but avoid duplicates).
then use the "Replace by consistent items from input list (or file)" function to read from this file.

 

See the attached example.


datamasking_referential_integrity611.zip