Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Parikhharshal
Creator III
Creator III

Masking data in Talend?

Hi all

 

I gather from few vidoes and blogs that it is possible to mask data using talend in a simple way using few masking related components.

 

What my requirement though is I want to be able to view unmasked data depending on user role/permission. Is this really possible in Talend?

 

I am using Redshift for Db.

 

Thanks

Harshal.

Labels (2)
16 Replies
Anonymous
Not applicable

Hi,

 

I have a question related to tDataMasking component.I am using tDataMasking to mask the input SSN number field.

 

I found that in the initial run 999-999-999 was masked to 123-456-789 but when I received the same SSN number on second file, as incremental file, the SSN 999-999-999 was masked to  some other value 789-456-123. Is there a way to mask the values in a defined way, instead of random, to maintain data integrity?

Sebastiao_Qlik
Employee
Employee

Hi Naveen,

 

yes, the tDataMasking component supports several schemes of masking: See https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/QSLEkWqZwGeZVah0erPbzA

Regarding the SSN masking, it supports the bijective masking capability: https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/DDvsI0xkSNVivuM9fMZhgA

You need to use the FPE encryption method for that.

 

Best regards

Anonymous
Not applicable

Thank you, I will give a try.

 

I have one more query related to dynamically selection of column to be Masked – I am using  tDataMasking component to mask the input columns of a delimited fie. My requirement here is to mask 1000+ files, each with different schema, using Talend job which will identify the column to be masked dynamically for each file. In other words, I don’t want to select the column to be masked  from tDataMasking dropdown for each file. Please let me know if we can achieve this  using tDataMasking or other Talend components.

Anonymous
Not applicable

Hi,

 

I have another question related to tDataMasking.

 

When “SEED FOR RANDOM GENERATOR” is used in masking, the output column is coming with Junk characters. The expectation is that data should be in a readable format.

To illustrate the issue, I have used the data from talend example and it returned different result.

Input - Ms Isabelle Turner
Output - Ly Çhxjuûâë Wmíøìï
SEED FOR RANDOM GENERATOR - 12345678

 

How can I get a readable output (i.e. English alphabet characters)?


tDataMasking.png
Sebastiao_Qlik
Employee
Employee

Hi,

I have no easy solution for this use case.

In the Studio, the configuration of the component is manual and the developer needs to select how to mask each column.

In Data Preparation, each column is semantically analyzed and for those columns having a semantic type, an automated masking can be done (we called it semantic masking).

But I don't see exactly how we could automate the two steps (semantic discovery then semantic masking) without knowing the schema of the data at first.

 

Sebastiao_Qlik
Employee
Employee

The behavior with accented characters has been improved in the 7.2 version of the Studio.

 

 

Basically, as explained at https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/~5JVmaygo~wT8V7uZB4RoQ

 

Characters that belong to the selected alphabets are masked with characters from the same character type within the selected alphabet.

When selecting the Best guess alphabet, masked values contain characters from all alphabets represented in the input values. Best guess is the default alphabet.

 

About supported characters

https://help.talend.com/reader/IetzD0OTgeEjWYQD77eKPw/HebSGq_ek_lNpZJFY6ZfQA

 

Anonymous
Not applicable

Thank you so much! I will give a try and get back.