Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik and ServiceNow Partner to Bring Trusted Enterprise Context into AI-Powered Workflows. Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
p_zaccheddu
Contributor II
Contributor II

Very large fact table divided in very large QVDs: is it better a implicit concatenate or an explicit concatenate?

Hypothetical case. Imagine a very big fact table (more than a billion rows). The data is in QVDs divided by years. For example:

FACT_2015.qvd
FACT_2016.qvd
FACT_2017.qvd
etc

Each qvd has about 20 fields and the average size is 200 MB.

To load them we have two methods
First an implicit concatenate (pseudo code):

FACT:
load * from FACT_*.qvd

or an explicit concatenate (always pseudo code)

set Firstyear=2015
set LastYear=2025

for i=$(Firstyear) to $(LastYear)
if i=$(Firstyear) then
set vTable='FACT:'
else
set vTable='Concatenate (FACT)'
end if

$(vTable)
load * from FACT_$(i).qvd

next

In terms of loading speed, would the two methods be equivalent or is one better than the other? I don't have data of this size at the moment, so I wanted to know if anyone had already done some tests in this sense 🙂

Labels (3)
1 Solution

Accepted Solutions
marcus_sommer

I would expect no significantly difference in the run-times because it's always a loop-approach regardless if it's performed with an implicit syntax or an explicit ones. The iterations and/or the variable-evaluating should not be noticeable against the loads itself.

Personally I prefer the explicit approach but I wouldn't take this for-loop and also not setting the concatenate-statement else using:

for each file in filelist('FullPathWithPattern')
   t: load F1, F2, F3, ... from [$(file)] (qvd);
next

Most important will be to ensure that the loads are optimized which means all files have the identically data-structure and no transformations are applied - unless a single where exists() is allowed.

View solution in original post

3 Replies
marcus_sommer

I would expect no significantly difference in the run-times because it's always a loop-approach regardless if it's performed with an implicit syntax or an explicit ones. The iterations and/or the variable-evaluating should not be noticeable against the loads itself.

Personally I prefer the explicit approach but I wouldn't take this for-loop and also not setting the concatenate-statement else using:

for each file in filelist('FullPathWithPattern')
   t: load F1, F2, F3, ... from [$(file)] (qvd);
next

Most important will be to ensure that the loads are optimized which means all files have the identically data-structure and no transformations are applied - unless a single where exists() is allowed.

p_zaccheddu
Contributor II
Contributor II
Author

Thanks. My other curiosity is that implicit concatenation obviously only works if there are the same fields on all QVDs. So I suppose that QS must do this check before saying "Ok, they are the same, let's concatenate them!". If I explicitly use the prefix "concatenate", wouldn't it be like telling QS "Ok, don't worry, concatenate them without doing checks"? I don't know if this leads to an actual time saving, I'm thinking of situations with a lot of QVDs or similar.

marcus_sommer

Without specifying an explicit concatenate(MyTable) statement only tables with the identical data-structure are merged. If the tables have a different data-structure you will mandatory need the explicit statement. But different data-structures will break the optimized load - and this should be avoided by larger data-set.