Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

A more elegant way for data quality check (data assertion)?

Hey guys,

I just joined this amazing community so here is my first question.

 

My task is to migrate data from MSSQL to AS/400. The input data has to be prepared beforehand.

tSchemaComplianceCheck should take care of any schemas or data types and lengths inconsistencies. However, the input data can also deviate from the expected business logic. For example: column1 should contain the values "Apples" or "Oranges" but not "Bananas".

At the moment, I solved this by manually checking the relevant rows for the expected values in a tMap component:

 

Example:

table1.column2== null? 
"Unexpected Value: Missing column2"
:
(table1.column1== null || table1.column1.equals("VALUE1"))?
table1.column2.toString()
:
table1.column1.equals("VALUE2")?
"SOME_LOGIC"
:
"Unexpected Value: " + table1.column1

This works out fine but the problem with this is follows:

1. Every column has to be String to hold the error message.

2. Filtering out invalid values with a tFilter Component and String comparison is not optimal (table1.column1.contains("Unexpected"))

3. Valid rows has to be joined again with the input tables (to get the correct column types and to avoid converting back from String) effectively doubling the logic, which is unwanted.

 

It's not practical to create an expected value List table for each column and use that as a lookup table in the tMap.

Ideally, I would want for rejected rows (one or more columns has an unexpected value) to be written into an extra table with one added column "ErrorMessage" that contains which columns were errorenous.

 

I feel like this is a common problem, so there should be a better solution to this. Is it with tAssert maybe?

 

Can't wait to read your answers. Let me know if I should clarify the problem even more.

 

Thank you in advance.

 

Best Regards,

Houssam

 

Labels (2)
1 Reply
Anonymous
Not applicable
Author

Hello,

tAssert component generates the boolean evaluation on the concern for the Job execution status and provides the Job status messages to tAssertCatcher.

Here are related scenarios:TalendHelpCenter0683p000009M9p6.pngetting up the assertive condition for a Job execution  and TalendHelpCenter:Viewing product orders status (on a daily basis) against a benchmark number 

Talend recommends using Talend Data Quality first to improve the quality of your data, because incomplete, low quality datasets produce poor analysis results, and data profiling is best done on complete, clean datasets.

Data profiling is about collecting and analyzing information about your data. It is important because it allows you to detect and understand data issues, then fix them.

Let us know if it is what you are looking for?

https://help.talend.com/reader/Cau_yL8zEBuz1qNJ95YqQw/KoalefBU0uSZmuVLiKSmag 

Best regards

Sabrina