Why are there extra values in my Synthetic key?

rsmithuchot1 · ‎2023-02-07

Dear The Qlik Community,

For my example, I have create a table in excel with two columns in excel:

FieldName	MeasureDimension
Name	D
Age	D
Timing	D
AgeGrouping	M
AgeCount	M
AgeMode	M

I have then loaded this table into my script, and created a second table that adds a row ID to the data, as follows:

Table1_TMP:
LOAD
    "FieldName",
    MeasureDimension,
    if(MeasureDimension='M','N') AS IsMeasureAdditional
FROM [lib://Richard_Test_Folder/Data_loading_Issue.xlsx]
(ooxml, embedded labels, table is [Practice Data]);

Table2:
 Load
 	RowNo() As ID,
    *
 Resident Table1_TMP;

What is happening, is that the tables form a synthetic key which is fine. However, what I am not understanding is for some reason the synthetic key has 9 rows, when both tables independently have 6 rows, so surely there should on be 6 rows in the synthetic key?

It seems that when the synthetic key is formed it is adding rows where there are null values between the tables. For instance, I I load the table like this, there is no issue:

Table1_TMP:
LOAD
    "FieldName",
    MeasureDimension,
    if(MeasureDimension='M','N','Not Required') AS IsMeasureAdditional
FROM [lib://Richard_Test_Folder/Data_loading_Issue.xlsx]
(ooxml, embedded labels, table is [Practice Data]);

Table2:
 Load
 	RowNo() As ID,
    *
 Resident Table1_TMP;

I just want to understand why the synthetic key table has 9 rows instead of 6?

The reason I am asking this, is because I am using someone else code in a project that mimics the above structure, and I can not understand why they have structured this in the way stated above.

Can anyone help me?

marcus_sommer · ‎2023-02-07

You don't need the second load with the rowno() else you could apply it within the first load - and by doing it you will remove the synthetic key, too because no second superfluously table is created.

rsmithuchot1 · ‎2023-02-07

Thank you for responding, I appreciate it.

I have limited/changed the data because it relates to my work so did not want to post any confidential information, but basically what happens next in the code is the first table gets dropped and then there is a series of self concatenations to add rows to the data filtering out on where clauses also to remove rows that are not needed.

I do not really need any solution to the problem, as the first table gets dropped and this removes the synthetic key anyways, I was just wondering if anyone knew why there where 9 rows instead of 6 when using

if(MeasureDimension='M','N') AS IsMeasureAdditional

in the load statement. I think I am just curious to know how the synthetic table forms, and why it seems to add extra rows when there are null values. If you have any information on this I would appreciate it!

marcus_sommer · ‎2023-02-07

A synthetic key creates a combination of the values - a kind of a cartesian product. There exists different opinions if synthetic keys are a problem or not. Having one or two within the data-model will often work but at least if there are synthetic keys between synthetic keys it will cause issues - in regard to performance and readability as well as to challenges to address the keys within aggr(), TOTAL statements, set analysis and so on. Therefore the only general recommendation could be to avoid them - always.

If your second table is used to perform further ETL you could just replace the table-name from the second to the first and then skipping this second load.

Chart

dimension

Script