Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
IAMDV
Luminary Alumni
Luminary Alumni

*** Testing / Data Checking / Quality Checking (Best Practices) ***

Dear QV Experts,

Please share the best practices on testing the QV document before go live process. I work in fairly small team and I wanted to understand the best practices to ensure 100% accuracy in QV documents.

Thanks in anticipation.

Cheers - DV

1 Solution

Accepted Solutions
Not applicable

Hi Deepak,

I would always think that part of BP in data quality check is checking through an external app that all the numbers tally and that your data model has been built in the right way.

One other technique I think is quite useful is using the exists function to make sure that underlying values match, especially on key (linking) fieldnames in QV. As you know, QV does linking of tables via shared fieldnames but there is nothing to say if the values in 2 tables match. Hence, exists function is useful.

For example,

Table1:

Load

LinkField,

LinkField as LinkFieldFromTable1,

LinkFieldCheck

from Table1;

Table2:

LinkField,

exists(LinkFieldCheck, LinkField) as DQCheck,

LinkField as LinkFieldFromTable2,

from Table2;

What exists(X,Y) function does is compare the underlying values in X to the underlying values in Y. When condition is fulfilled, i.e. there is a match, it returns Boolean -1, when values don't match it returns zero.

Hence, if you expect a match between LinkField in Table1 and Table2 in example above, when you put a Listbox for DQCheck in front-end, there should be no value 0 (zero) in it. Otherwise, click on it and put down listboxes for LinkFieldFromTable1 and LinkFieldFromTable2 and see where the values don't match. Very useful for ensuring linking fields have a match.

Second DQ check is via the Table Viewer.

Couple of pieces of info. When you hover over a field, Information Density tells you amount of nulls in that fieldname, i.e. if less than 100%, contains nulls.

On a linking field, the subset ratio tells you the % number of values in that linking field in that table that is found in the whole data model. Say for some reason, I don't 2 tables with a linking field, on hover over of one field value, subset ratio is 100%, and the other it is 60%, that means that the 60% fieldname only has 60% of all the linkfield values it sees in the WHOLE data model.

I use 3 rules of thumb across linking fields:

If sum of subset ratios across 2 tables is 200%, all is well.

If sum of subset ratios across 2 tables is 100-200%, then one table contains less linking field values than the other, i.e. worth investigating if there is a business rationale that justifies the mismatch.

If sum of subset ratios across 2 tables is less than 100% (usually when substanitially less than 100%, it means that linking field is shared over more than 2 tables, otherwise it will be 100% exactly), then serious problem, as this means that there is no matching values in the underlying tables. Needs serious investigation.

Lastly, a few pieces of info on hover-over of a linking field in Table Viewer:

Perfect Key - subset ratio is 100% and all values therein are distinct (no duplicates).

Primary Key - subset ratio is less than 100% and all values therein are distinct (no duplicates).

Key - Non distinct, duplicates contained.

Hope this helps. But would be keen on hearing others techniques on data validation.

View solution in original post

9 Replies
Not applicable

Hi Deepak,

I would always think that part of BP in data quality check is checking through an external app that all the numbers tally and that your data model has been built in the right way.

One other technique I think is quite useful is using the exists function to make sure that underlying values match, especially on key (linking) fieldnames in QV. As you know, QV does linking of tables via shared fieldnames but there is nothing to say if the values in 2 tables match. Hence, exists function is useful.

For example,

Table1:

Load

LinkField,

LinkField as LinkFieldFromTable1,

LinkFieldCheck

from Table1;

Table2:

LinkField,

exists(LinkFieldCheck, LinkField) as DQCheck,

LinkField as LinkFieldFromTable2,

from Table2;

What exists(X,Y) function does is compare the underlying values in X to the underlying values in Y. When condition is fulfilled, i.e. there is a match, it returns Boolean -1, when values don't match it returns zero.

Hence, if you expect a match between LinkField in Table1 and Table2 in example above, when you put a Listbox for DQCheck in front-end, there should be no value 0 (zero) in it. Otherwise, click on it and put down listboxes for LinkFieldFromTable1 and LinkFieldFromTable2 and see where the values don't match. Very useful for ensuring linking fields have a match.

Second DQ check is via the Table Viewer.

Couple of pieces of info. When you hover over a field, Information Density tells you amount of nulls in that fieldname, i.e. if less than 100%, contains nulls.

On a linking field, the subset ratio tells you the % number of values in that linking field in that table that is found in the whole data model. Say for some reason, I don't 2 tables with a linking field, on hover over of one field value, subset ratio is 100%, and the other it is 60%, that means that the 60% fieldname only has 60% of all the linkfield values it sees in the WHOLE data model.

I use 3 rules of thumb across linking fields:

If sum of subset ratios across 2 tables is 200%, all is well.

If sum of subset ratios across 2 tables is 100-200%, then one table contains less linking field values than the other, i.e. worth investigating if there is a business rationale that justifies the mismatch.

If sum of subset ratios across 2 tables is less than 100% (usually when substanitially less than 100%, it means that linking field is shared over more than 2 tables, otherwise it will be 100% exactly), then serious problem, as this means that there is no matching values in the underlying tables. Needs serious investigation.

Lastly, a few pieces of info on hover-over of a linking field in Table Viewer:

Perfect Key - subset ratio is 100% and all values therein are distinct (no duplicates).

Primary Key - subset ratio is less than 100% and all values therein are distinct (no duplicates).

Key - Non distinct, duplicates contained.

Hope this helps. But would be keen on hearing others techniques on data validation.

Not applicable

Sorry, small mistake, read:

Table1:

Load

LinkField,

LinkField as LinkFieldFromTable1,

LinkField as LinkFieldCheck

from Table1;

Table2:

LinkField,

exists(LinkFieldCheck, LinkField) as DQCheck,

LinkField as LinkFieldFromTable2,

from Table2;

IAMDV
Luminary Alumni
Luminary Alumni
Author

Cheenu - Thank you for detailed explanation. I still need to digest all the ideas. I'll get back to you with more obstacles/ questions.

Meanwhile, I request other experts to help on this one.

Cheers - DV

Not applicable

Np, Deepak. Tried to give as comprehensive an answer as possible.

Regarding the UDF and variables, I have reached the end of the road. Hope another Community user will be able to get you all the way. I would also be interested in hearing the answer.

IAMDV
Luminary Alumni
Luminary Alumni
Author

Cheenu - Just for my understanding can you please complete the below script? I will be using this one as an example to understand the concept. Thanks in advance...

CurrentYear:
Load * Inline [
Sales, SalesPerson
100, John
90, Alan
, Rick];

PreviousYear:
Load * Inline [
Sales
90
99];

----------------------------------------------------------------------------------------------------------------

Please amend the below fields
----------------------------------------------------------------------------------------------------------------

Table1:
Load
LinkField,
LinkField as LinkFieldFromTable1,
LinkField as LinkFieldCheck
from Table1;


Table2:
Load
LinkField,
exists(LinkFieldCheck, LinkField) as DQCheck,
LinkField as LinkFieldFromTable2,
from Table2;

Cheers - DV

Not applicable

Hi Deepak,

Was not too sure how to interpret the inline load tables you gave, so created another app that should explain (hopefully) how exists can make a comparison between 2 field values coming from 2 different tables. (Of course the Inline loads are not great examples to load data and do manips on them, i.e. that's why the load inline, then load resident of inline table and drop inline table in script).

Hope it helps.

IAMDV
Luminary Alumni
Luminary Alumni
Author

Thanks Cheenu. I had understood this concept now. Many thanks for taking time to demonstrate with the example.

I was expecting posts from other users. Hopefully we will have more.

Cheers - DV

jldengra
Creator
Creator

Hi !

Your explanation is so valuable for checking if the data is well formed in a QV document or the data model is correct.

But what about checking the consistence between the data shown in the QV document and the data from the original data source/s ?

I have recently started working with QlikView, and I was asked to check if a document data was correct, that is, if the document data was consistent with the data from the origin SQL Server database.

What I did was to create a Transact-SQL script with some temporary tables performing the unions and joins that QlikView was assumed to do, and then querying the result tables filtering as well as selecting in the QV document and then cheking that the values show in QV were the same that the query returned.

It was easy to check by querying different scenarios, but it was hard to previously simulate in the script the unions and joins exactly as QV does, taking some hours for tuning them.

How do you compare QV results with the original data sources to check the consistence ?

Is there a way to perform this checking avoiding to simulate in a database script what QV performs inside when loading the data ?

Thank you,

J.L.Dengra

Not applicable

Although it is important to ensure that the data model is correct, ultimately what is displayed to the end user is what matters. One way to match the values show is to export the data from the charts and import them to Excel or a database to do the the comparaison. You can use macros to automate the process but you will need to repeat the calculations in Excel (or database) to perform the comparison check.