Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi experts,
I hope you can help me with this issue:
In data modelling I need to load table from qvd (data from 2018) and concatenate it to table which takes data from 1.1.2019 to now. But also with using exists.
Like this:
table_idnumbers:
load ID_NUMBER from t_numbers where ID_NUMBER>10;
2019_data:
LOAD
DATE,
ID_NUMBER,
VALUE
FROM value_data.qvd
where exists(ID_NUMBER); //because i want values just for ID_NUMBERS from table_idnumbers
CONCATENATE(2019_data)
LOAD
DATE,
ID_NUMBER,
VALUE
from value_data2018.qvd
where exists (ID_NUMBER); //I want also to take values from 2018 just for ID_NUMBER from table_idnumbers
It takes very long time...://
How to optimized this. Also I think exist function for 2018 data is not good...
Thanks a lot
Jasmina
Hi Jasmina,
Your script is already optimized. There is no quicker filtration then Exists.
Try this:
table_idnumbers:
load DISTINCT
ID_NUMBER as Filter
from t_numbers where ID_NUMBER>10;
2019_data:
LOAD
DATE,
ID_NUMBER,
VALUE
FROM value_data.qvd
where exists(Filter,ID_NUMBER);
CONCATENATE(2019_data)
LOAD
DATE,
ID_NUMBER,
VALUE
from value_data2018.qvd
where exists (Filter, ID_NUMBER);
This has a slight chance of being a bit quicker but you gotta be sure your data was already aggregated, otherwise you'll suppress duplicate rows:
2019_data:
LOAD
DATE,
ID_NUMBER,
VALUE
FROM value_data.qvd;
CONCATENATE(2019_data)
LOAD
DATE,
ID_NUMBER,
VALUE
from value_data2018.qvd;
INNER JOIN(2019_data)
load DISTINCT
ID_NUMBER
from t_numbers where ID_NUMBER>10;
I hope that helps!
Regards,
S.T.
Yes, it is pretty good solution.
Next little problem, combination exists(ID_NUMBER) and using one more condition...
for example:
where exists(ID_NUMBER) and not exists('31.12.2018')
😕
Hi Jasmina,
I presume the filtered data set is small enough and a resident load would work much faster, so lets try something like:
2019_data_temp:
LOAD
DATE,
ID_NUMBER,
VALUE
FROM value_data.qvd;
CONCATENATE(2019_data_temp)
LOAD
DATE,
ID_NUMBER,
VALUE
from value_data2018.qvd;
INNER JOIN(2019_data_temp)
load DISTINCT
ID_NUMBER
from t_numbers where ID_NUMBER>10;
2019_data:
NOCONCATENATE
LOAD * RESIDENT 2019_data_temp
WHERE DATE<>'31.12.2018'
;
DROP TABLE 2019_data_temp;
But keep in mind, that the optimal solution depends on the volume of your data.
I hope that helps!
Kind regards,
S.T.
Volume of my data is about 500 million of rows.
Ehh, with exists I really found good thing, because I dont need this 500 million. With exists I reduce it to 40 million. But first must go table with ID_NUMBERS.
After is going table with values.