Qlik Community

QlikView Documents

Documents for QlikView related information.

Announcements
QlikView Fans! We’d love to hear from you.
Share your QlikView feedback with the product team… Click here to participate in our 5-minute survey.
Rules, plus terms and conditions, can be found here.

Building a slowly changing dimension in Qlikview

apierensv
New Contributor III

Building a slowly changing dimension in Qlikview

Hello,

After answering to a post on the forum on this subject, I decided to create a document to share my solution to this problematic.

The script in the attached application will enable date interval creation when your source data is not able to give you it (in this case it means you don't have a real datawarehouse...).

If you don't have a slowly changing dimension, or if your data change every day (currency rates for example), it is not usefull at all to build date intervals, you just have to concatenate and add a field date with the date of the picture. But is it very interesting if your data moves "slowly" (customer table, contract tables...) to decrease the number of rows dramatically.

The algorithm uses 5 cases, the first 3 are "basic", the last 2 only here to manage rows appearing, disappearing and re-appearing, no matter the number of times :

1) The rows of the current generation are completely new

2) Opposite case than below : the past generation rows don't exist in the current generation

3) The rows of the previous generation exist in the current generation

4) The rows before the previous generation exist in the current but not in the previous generation

5) The rows of the current generation exist in the generation before the previous but not in the previous generation

To industrialize the process as you probably will need a loop, I advice you to store the generation date in a csv or a qvd, so in case of crash in the middle of the loop, the last generation date -which is very important because used to build the date intervals- will be saved.

At the initialization (first generation to integrate) it is easy, you can fix the begin_date at the date of the file for all the rows, and the end_date at '31/12/9999' also for all the rows, then from the second generation to process you have to apply the same algorithm.

Be careful with the duplicate rows, this algorithm doesn't handle them so your data will be wrong, you have to aggregate to de-duplicate, via counting and multiplying the values with the count for example.

Moreover you don't need to know if there is a primary key, as the detection uses all the fields, just be sure you don't have duplicate rows.

Here is the example of the application, admitting the initialization is already done (as DATA_OLD has date intervals). You have all the possible cases in this little example ('1,1' : stay valid | '2,8' : become dead | '2,5' : new | '3,4' : dead existing and also re-appearing) :

DATA_OLD:

load field1,field2,date(begin_date,'DD/MM/YYYY') as begin_date,date(end_date,'DD/MM/YYYY') as end_date inline

[field1,field2,begin_date,end_date

1,1,01/04/2016,31/12/9999

2,8,01/04/2016,31/12/9999

3,4,01/04/2016,15/04/2016];

DATA_NEW:

load field1,field2 inline

[field1,field2

1,1

2,5,

3,4];

Final table :

field1field2begin_dateend_date
1101/04/201631/12/9999
2801/04/201620/04/2016
2521/04/201631/12/9999
3401/04/201615/04/2016
3421/04/201631/12/9999
Labels (1)
Attachments
Version history
Revision #:
1 of 1
Last update:
‎04-26-2016 05:53 AM
Updated by: