Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hypothetical case. Imagine a very big fact table (more than a billion rows). The data is in QVDs divided by years. For example:
FACT_2015.qvd
FACT_2016.qvd
FACT_2017.qvd
etc
Each qvd has about 20 fields and the average size is 200 MB.
To load them we have two methods
First an implicit concatenate (pseudo code):
FACT:
load * from FACT_*.qvd
or an explicit concatenate (always pseudo code)
set Firstyear=2015
set LastYear=2025
for i=$(Firstyear) to $(LastYear)
if i=$(Firstyear) then
set vTable='FACT:'
else
set vTable='Concatenate (FACT)'
end if
$(vTable)
load * from FACT_$(i).qvd
next
In terms of loading speed, would the two methods be equivalent or is one better than the other? I don't have data of this size at the moment, so I wanted to know if anyone had already done some tests in this sense 🙂
I would expect no significantly difference in the run-times because it's always a loop-approach regardless if it's performed with an implicit syntax or an explicit ones. The iterations and/or the variable-evaluating should not be noticeable against the loads itself.
Personally I prefer the explicit approach but I wouldn't take this for-loop and also not setting the concatenate-statement else using:
for each file in filelist('FullPathWithPattern')
t: load F1, F2, F3, ... from [$(file)] (qvd);
next
Most important will be to ensure that the loads are optimized which means all files have the identically data-structure and no transformations are applied - unless a single where exists() is allowed.
I would expect no significantly difference in the run-times because it's always a loop-approach regardless if it's performed with an implicit syntax or an explicit ones. The iterations and/or the variable-evaluating should not be noticeable against the loads itself.
Personally I prefer the explicit approach but I wouldn't take this for-loop and also not setting the concatenate-statement else using:
for each file in filelist('FullPathWithPattern')
t: load F1, F2, F3, ... from [$(file)] (qvd);
next
Most important will be to ensure that the loads are optimized which means all files have the identically data-structure and no transformations are applied - unless a single where exists() is allowed.
Thanks. My other curiosity is that implicit concatenation obviously only works if there are the same fields on all QVDs. So I suppose that QS must do this check before saying "Ok, they are the same, let's concatenate them!". If I explicitly use the prefix "concatenate", wouldn't it be like telling QS "Ok, don't worry, concatenate them without doing checks"? I don't know if this leads to an actual time saving, I'm thinking of situations with a lot of QVDs or similar.
Without specifying an explicit concatenate(MyTable) statement only tables with the identical data-structure are merged. If the tables have a different data-structure you will mandatory need the explicit statement. But different data-structures will break the optimized load - and this should be avoided by larger data-set.