Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a question rgd. LOAD times for one specific table:
- The table is very big, but on one field it can be split: One part (only one value in that field) has about 5% of all records, the other part (all other values in that field) has about 95% of records.
- There is nów only one filter splitting the table in two: >> TRANCODE = 'SPRECEIVE' << and >> TRANCODE <> 'SPRECEIVE' <<
<=> The first part (LOADed from a qvd file) takes much longer to load than the second one.
These are the two loads: FIRST LOAD:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Trans_Dtl_Rec:
LOAD
ITEM_NUMBER as Rec.ITEM_NUMBER,
TRAN_DATE_TD as Rec.TRAN_DATE_TD,
TRAN_DATE_Time as Rec.TRAN_DATE_Time,
TRAN_CODE as Rec.TRAN_CODE,
QUANTITY as Rec.QUANTITY,
BIN as Rec.BIN,
TRACKING_NUMBER as Rec.TRACKING_NUMBER,
CONTROL_NUMBER as Rec.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Rec.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE TRAN_CODE = 'SPRECEIVE';
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
SECOND LOAD
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Trans_Dtl_aside:
LOAD
ITEM_NUMBER as Aside.ITEM_NUMBER,
TRAN_DATE_TD as Aside.TRAN_DATE_TD,
TRAN_DATE_Time as Aside.TRAN_DATE_Time,
TRAN_CODE as Aside.TRAN_CODE,
QUANTITY as Aside.QUANTITY,
BIN as Aside.BIN,
TRACKING_NUMBER as Aside.TRACKING_NUMBER,
CONTROL_NUMBER as Aside.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Aside.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE TRAN_CODE <> 'SPRECEIVE';
=> Is there any way I can speed that up?
Thanks a lot!
Best regards,
DataNibbler
Hi DataNibbler,
at first I would check if it's really really needed to split this into two tables - of course you will have a reason but there might be other ways to solve this task.
If you want to speed up a qvd-load you must be to ensure that the load runs optimized. You could reach this by using a where exists() statement. Try something like this:
Dummy:
Load 'SPRECEIVE' as TRAN_CODE autogenerate 1;
Trans_Dtl_Rec:
LOAD
ITEM_NUMBER as Rec.ITEM_NUMBER,
TRAN_DATE_TD as Rec.TRAN_DATE_TD,
TRAN_DATE_Time as Rec.TRAN_DATE_Time,
TRAN_CODE as Rec.TRAN_CODE,
QUANTITY as Rec.QUANTITY,
BIN as Rec.BIN,
TRACKING_NUMBER as Rec.TRACKING_NUMBER,
CONTROL_NUMBER as Rec.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Rec.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE exists(TRAN_CODE)';
Trans_Dtl_aside:
LOAD
ITEM_NUMBER as Aside.ITEM_NUMBER,
TRAN_DATE_TD as Aside.TRAN_DATE_TD,
TRAN_DATE_Time as Aside.TRAN_DATE_Time,
TRAN_CODE as Aside.TRAN_CODE,
QUANTITY as Aside.QUANTITY,
BIN as Aside.BIN,
TRACKING_NUMBER as Aside.TRACKING_NUMBER,
CONTROL_NUMBER as Aside.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Aside.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE not exists(TRAN_CODE)';
drop table Dummy;
- Marcus
Hi DataNibbler,
at first I would check if it's really really needed to split this into two tables - of course you will have a reason but there might be other ways to solve this task.
If you want to speed up a qvd-load you must be to ensure that the load runs optimized. You could reach this by using a where exists() statement. Try something like this:
Dummy:
Load 'SPRECEIVE' as TRAN_CODE autogenerate 1;
Trans_Dtl_Rec:
LOAD
ITEM_NUMBER as Rec.ITEM_NUMBER,
TRAN_DATE_TD as Rec.TRAN_DATE_TD,
TRAN_DATE_Time as Rec.TRAN_DATE_Time,
TRAN_CODE as Rec.TRAN_CODE,
QUANTITY as Rec.QUANTITY,
BIN as Rec.BIN,
TRACKING_NUMBER as Rec.TRACKING_NUMBER,
CONTROL_NUMBER as Rec.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Rec.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE exists(TRAN_CODE)';
Trans_Dtl_aside:
LOAD
ITEM_NUMBER as Aside.ITEM_NUMBER,
TRAN_DATE_TD as Aside.TRAN_DATE_TD,
TRAN_DATE_Time as Aside.TRAN_DATE_Time,
TRAN_CODE as Aside.TRAN_CODE,
QUANTITY as Aside.QUANTITY,
BIN as Aside.BIN,
TRACKING_NUMBER as Aside.TRACKING_NUMBER,
CONTROL_NUMBER as Aside.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Aside.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE not exists(TRAN_CODE)';
drop table Dummy;
- Marcus
Try to load using Where Exist().
Like this,
TEMP;
Load * Inline [
TRAN_CODE
SPRECEIVE
];
TRANS_DTL_REC:
LOAD
......
FROM .....(QVD)
WHERE Exists(TRAN_CODE);
TRANS_DTL_REC:
LOAD
......
FROM .....(QVD)
WHERE Not Exists(TRAN_CODE);
Drop table TEMP;
This will save some time.
Hi,
thanks a lot! I will try.
Yes, the splitting of the table does have a reason - I need the respective parts several times during the script execution, so to make all these LOADs RESIDENT, I load the table beforehand, and since in some places I need only the smaller part, in others I need only the bigger part, it makes sense as both parts are smaller to parse than the total ...
This still does not explain why the smaller part takes much longer in the loading than the big one - but never mind. If I can speed the whole thing up, all the better.
Now I have introduced a partial_reload_scheme, in case the report is used several times one after another, so that these tables are just left standing instead of being reloaded every time.
What I'd like to do next is, building on top of this, to make the script decide itself when it makes sense to reload that big table - once an hour, when it has been updated from the database - and otherwise, it should always perform a partial reload - or I could do it otherwise - but I will make that a new thread in order to not confuse things here.
Best regards,
DataNibbler
Hi,
the LOADs still do not run in optimized mode and I cannot spot why:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Temp:
LOAD 'SPRECEIVE' as TranCode2look4 AutoGenerate 1;
Trans_Dtl_Rec:
LOAD
ITEM_NUMBER as Rec.ITEM_NUMBER,
TRAN_DATE_TD as Rec.TRAN_DATE_TD,
TRAN_DATE_Time as Rec.TRAN_DATE_Time,
TRAN_CODE as Rec.TRAN_CODE,
QUANTITY as Rec.QUANTITY,
BIN as Rec.BIN,
TRACKING_NUMBER as Rec.TRACKING_NUMBER,
CONTROL_NUMBER as Rec.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Rec.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE EXISTS(TranCode2look4, TRAN_CODE);
Hi DataNibbler,
only the single-parameter version of the exists(TRAN_CODE) statement will be optimized. As far as you put a second parameter into it you forced qlik to process this load instead of transferring it directly into the RAM.
Although the approach of using where exists() worked very well you could be forced to adjust your load-approaches a bit to be able to use the single-parameter version in cases where this field is already loaded within previous loadings.
One approach is to create a temp-field for it within the load which creates the qvd and another would be to rename the fields like:
rename field TRAN_CODE into TRAN_CODE_TEMP;
Temp:
LOAD 'SPRECEIVE' as TRAN_CODE AutoGenerate 1;
Trans_Dtl_Rec:
LOAD
ITEM_NUMBER as Rec.ITEM_NUMBER,
TRAN_DATE_TD as Rec.TRAN_DATE_TD,
TRAN_DATE_Time as Rec.TRAN_DATE_Time,
TRAN_CODE as Rec.TRAN_CODE,
QUANTITY as Rec.QUANTITY,
BIN as Rec.BIN,
TRACKING_NUMBER as Rec.TRACKING_NUMBER,
CONTROL_NUMBER as Rec.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Rec.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE EXISTS(TRAN_CODE);
rename field TRAN_CODE_TEMP into TRAN_CODE;
- Marcus
Hi Marcus,
I cannot quite follow the execution_order of your code - where does this RENAME statement come in?
Hi DataNibbler,
if your field is already loaded and maybe therefore not be usable for the where-exists-logic you would rename it directly before you want to load your optimized qvd's and afterwards you might want/need to rename it again - it will be depending on your datamodel.
Of course this kind of scripting adds some more overhead and complexity but it's the price you need to pay to get optimized loadings.
Here are various quite explanatory links included to the exists-topic which will help you to understand how exists worked and how it could be implemented: Advanced topics for creating a qlik datamodel.
- Marcus
Thanks Marcus!
I will have a look.
Hi Marcus,
I think I have read all of the articles on the issue. I know what the issue is - but I still cannot imagine how to avoid it. The confusing thing about your post (your example) is that those RENAME commands come as the very first and last statements, around everything else, not within the LOAD, so I don't know what they are supposed to do - although I think I remember I have come across this issue before ...
Sorry. Can you give me one more hint?
Thanks a lot!
Best regards,
DataNibbler