For data quality reasons, I make extensive use of some expressions like aggr( count( DISTINCT <value> ) , <dimension> ).
Sadly, I found that the behavior of QV differs according to the number of tables.
In the following qvw, you have two cases:
- firstly, when you select 2 for nb_of_tables, you have a first table without duplicates and a second one WITH duplicates. The result is that QV realizes the cartesian product of the two tables and my expression detect the duplicates.
- secondly, when you select 3 for nb_of_tables, you still have the two preceding tables plus a third one WITH duplicates. This time, the cartesian product isn't realized and my expression doesn't detect the duplicates.
This creates two problems :
- How can I ensure data quality by detecting duplicates ?
- I already found this strange behavior while following the "Advandced Topics" training and the instructor said me that QV was using keys and not tables. That seems a bit confusing for me. Could someone clarify the internals ( insofar as possible ) of QV ?
Please excuse my hesitating english.
Thanks for clarifying.