Synthetic keys vs. AutoNumber() (- concatenation)

    I’ve read a lot of articles about synthetic keys. Most people recommend not using them, because QlikView builds internal mapping tables. Concatenating keys (=composite keys) in combination with the AutoNumber() function will result in better performance without any doubt.

    But in my opinion there’s a very big drawback concerning concatenated keys. It’s very easy to understand. For my example I need an Excel input file with following tables:

     

    LEFT:

    RIGHT:

    (The empty cells will lead to null() – values)

     

    Example 1: synthetic keys:


    TABLE1:
    LOAD KEY1,
    KEY2,
    LEFT
    FROM
    [input.xlsx]
    (
    ooxml, embedded labels, table is LEFT);

    TABLE2:
    LOAD KEY1,
    KEY2,
    RIGHT
    FROM
    [input.xlsx]
    (
    ooxml, embedded labels, table is RIGHT);

     

    Result:

     

    You can see, that the keys only match, if KEY1 and KEY2 match. Null() isn’t a value so it will never match.

     

    Example 2: concatenation + AutoNumber()


    TABLE3:
    LOAD AutoNumber(KEY1KEY2) as AUTONUMBER_KEY,
    LEFT as LEFT2
    FROM
    [input.xlsx]
    (
    ooxml, embedded labels, table is LEFT);

    TABLE4:
    LOAD AutoNumber(KEY1KEY2) as AUTONUMBER_KEY,
    RIGHT as RIGHT2
    FROM
    [input.xlsx]
    (
    ooxml, embedded labels, table is RIGHT);

     

     

    Result:

     

     

    Now the drawback I mentioned is clearly visible. For example LEFT2=4 will be matched with RIGHT2 = 14. This can cause big problems, because KEY1 has nothing to do with KEY2 normally.

     

    Conclusion


    On the one hand concatenation of keys can cause problems, because there’s no clear separation between the keys. Especially the existence of null() values is risky. A separator like '*#*’ in combination with the replacement of null() values can reduce the risk significantly.


    On the other hand the behaviour in example 2 could be desired. There’s no right or wrong. I just wanted you to think about your data before using composite keys.


    Example 3: an approach for fixing the problem


    TABLE5:
    LOAD AutoNumber(if(IsNull(KEY1),'*1*',KEY1) & '*#*' & if(IsNull(KEY2),'*1*',KEY2)) as AUTONUMBER_KEY2,
    LEFT as LEFT3
    FROM
    [input.xlsx]
    (
    ooxml, embedded labels, table is LEFT);

    TABLE6:
    LOAD AutoNumber(if(IsNull(KEY1),'*2*',KEY1) & '*#*' & if(IsNull(KEY2),'*2*',KEY2)) as AUTONUMBER_KEY2,
    RIGHT as RIGHT3
    FROM
    [input.xlsx]
    (
    ooxml, embedded labels, table is RIGHT);

     

    Result: