Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Dealing with Fact and Dimension Tables

I am looking for some best practices around dealing with fact and dimension tables from a relational source to be loaded into QV.

I have a fact table that I load only a certain period of data into the document (say, for example 1 month).

This fact table joins to many other dimension tables in a snowflake fashion.

The trick is, that I only want to load information from the dimension tables if I have data from the fact table during the time period that was loaded.   In other words, I don't want to load products from my products master table if that product had no activity (fact data) within the time that was selected.

I would prefer not to do any joins at the database level because the fact table is tremendously large.   I would prefer to load the fact data into QV and then with that in QV determine what needs to be loaded from the master dimension tables.

What is the best way to accomplish this?

1 Solution

Accepted Solutions
johnw
Champion III
Champion III

If I understood the question, use exists() to limit your data to what's already been loaded into the fact table.  Here's an example:

Facts:
LOAD
Something
,SomeOtherField
,Whatever
FROM facts data source
WHERE my conditions
;

Somethings:
LOAD
Something
,SomethingDescription
,SomethingDate
FROM somethings data source
WHERE exists(Something)
;

If your dimension tables are particularly large, and you are selecting a particularly small number of values from them, there's another approach that may be better (passing the existing values to the SQL for the dimension table read), but exists() is probably the most typical solution to this kind of problem.

View solution in original post

4 Replies
johnw
Champion III
Champion III

If I understood the question, use exists() to limit your data to what's already been loaded into the fact table.  Here's an example:

Facts:
LOAD
Something
,SomeOtherField
,Whatever
FROM facts data source
WHERE my conditions
;

Somethings:
LOAD
Something
,SomethingDescription
,SomethingDate
FROM somethings data source
WHERE exists(Something)
;

If your dimension tables are particularly large, and you are selecting a particularly small number of values from them, there's another approach that may be better (passing the existing values to the SQL for the dimension table read), but exists() is probably the most typical solution to this kind of problem.

Not applicable
Author

Thank you for your assitance.   It got me past where I was getting stuck.

Apparently there is something wrong with my document because I kept getting a script failure after the run completed, and then a prompt to reload old data.   I thought it was my script and the way I was trying to do the load, but I think something is corrupted in the document itself.   When I did a debug, it went through all the script properly, but at the end it failed.

I exported the script and put the same script into a new document and it worked!   Not sure what is wrong with my original.

Anyway, back to the original problem, here is what my script turned out to be:

Sales:

SQL SELECT top 1000

[Item_Key]

,[Store_Key]

,[Time_Key]

,[Invoice_Date]

,[Invoice_Number]

FROM "Fact_Sales";

Item:

LOAD *

Where Exists(Item_Key);

SELECT

[Item_Key]

,[Item_ID]

,[Item_Desc]

FROM "Dim_Item";


Is this the best/fastest approach?

johnw
Champion III
Champion III

Again, that's the most typical solution, and I'd say in most situations the best.  But it probably isn't the fastest.  The issue with this approach is that the exists() is only applied once QlikView gets its hands on the data, which is to say AFTER your select has already loaded EVERY record from the Dim_Item table.  In practice, this is rarely a problem - dimension tables tend to be small.  But the faster approach in most cases would be what I mentioned earlier - passing the existing values to the SQL.

I don't recommend it for most cases, because it usually wouldn't be worth the extra complication.  But I think it would look something like this if you want to try it and see if there's any practical difference in load time.  Chances are I have syntax or other errors.

Sales:
LOAD *
;
SQL SELECT top 1000
[Item_Key]
,[Store_Key]
,[Time_Key]
,[Invoice_Date]
,[Invoice_Number]
FROM "Fact_Sales"
;

Items:
LOAD concat(Item_Key,chr(39)&','&chr(39)) as Item_Keys
;
LOAD text(fieldvalue('Item_Key',recno())) as Item_Key
AUTOGENERATE fieldvaluecount('Item_Key')
;
LET vItemKeys = peek('Item_Keys');
DROP TABLE Items;

Items:
LOAD *
;
SQL SELECT
[Item_Key]
,[Item_ID]
,[Item_Desc]
FROM "Dim_Item"
WHERE Item_Key IN('$(vItemKeys)')
;

Not applicable
Author

John,

Sorry for the delayed response (was kind of hectic here), but I did want to thank you for your feedback.  

I agree that in most cases it is not an issue to load the whole dimension in.  Sometimes I run across dimension tables that are very wide and very long and just to save data transfer over the network, it is helpful to have alternatives.

Again, thanks for your help.