Building a slowly changing dimension in Qlikview

    Hello,

     

    After answering to a post on the forum on this subject, I decided to create a document to share my solution to this problematic.

     

    The script in the attached application will enable date interval creation when your source data is not able to give you it (in this case it means you don't have a real datawarehouse...).

     

    If you don't have a slowly changing dimension, or if your data change every day (currency rates for example), it is not usefull at all to build date intervals, you just have to concatenate and add a field date with the date of the picture. But is it very interesting if your data moves "slowly" (customer table, contract tables...) to decrease the number of rows dramatically.

     

    The algorithm uses 5 cases, the first 3 are "basic", the last 2 only here to manage rows appearing, disappearing and re-appearing, no matter the number of times :

     

    1) The rows of the current generation are completely new

    2) Opposite case than below : the past generation rows don't exist in the current generation

    3) The rows of the previous generation exist in the current generation

    4) The rows before the previous generation exist in the current but not in the previous generation

    5) The rows of the current generation exist in the generation before the previous but not in the previous generation

     

    To industrialize the process as you probably will need a loop, I advice you to store the generation date in a csv or a qvd, so in case of crash in the middle of the loop, the last generation date -which is very important because used to build the date intervals- will be saved.

     

    At the initialization (first generation to integrate) it is easy, you can fix the begin_date at the date of the file for all the rows, and the end_date at '31/12/9999' also for all the rows, then from the second generation to process you have to apply the same algorithm.

     

    Be careful with the duplicate rows, this algorithm doesn't handle them so your data will be wrong, you have to aggregate to de-duplicate, via counting and multiplying the values with the count for example.

     

    Moreover you don't need to know if there is a primary key, as the detection uses all the fields, just be sure you don't have duplicate rows.

     

    Here is the example of the application, admitting the initialization is already done (as DATA_OLD has date intervals). You have all the possible cases in this little example ('1,1' : stay valid | '2,8' : become dead | '2,5' : new | '3,4' : dead existing and also re-appearing) :

     

    DATA_OLD:

    load field1,field2,date(begin_date,'DD/MM/YYYY') as begin_date,date(end_date,'DD/MM/YYYY') as end_date inline

    [field1,field2,begin_date,end_date

    1,1,01/04/2016,31/12/9999

    2,8,01/04/2016,31/12/9999

    3,4,01/04/2016,15/04/2016];

     

    DATA_NEW:

    load field1,field2 inline

    [field1,field2

    1,1

    2,5,

    3,4];

     

     

    Final table :

     

    field1field2begin_dateend_date
    1101/04/201631/12/9999
    2801/04/201620/04/2016
    2521/04/201631/12/9999
    3401/04/201615/04/2016
    3421/04/201631/12/9999