Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I am looking at optimizing the following script, which takes about 3 seconds for the first two loads, but around 40' (minutes) for the 3rd load. The table has around 1.5 million rows. The results are as expected but the performance is of concern. Is there any elegant way someone can propose?
Sample of Initial Data:
Corresponding Result Data:
TravelDataTemp1:
LOAD
TRAVEL_ID,
SEQ_NUM,
TRAVEL_DATE,
DRIVER_ID,
TO_STATE_ID,
FR_STATE_ID,
BUS_NUM
FROM [..\0_Extracts\TravelData.qvd] (qvd);
TravelDataTemp2:
NoConcatenate
LOAD
TRAVEL_ID,
SEQ_NUM,
BUS_NUM,
AutoNumber(TRAVEL_ID & '|' & BUS_NUM) as %TIDBusNR, //Travel ID + Bus Nr key
DRIVER_ID,
TRAVEL_DATE,
TO_STATE_ID,
FR_STATE_ID
Resident TravelDataTemp1
Order By TRAVEL_ID, BUS_NUM asc;
Drop Table TravelDataTemp1;
TravelData:
NoConcatenate
LOAD
TRAVEL_ID,
%TIDBusNR, //Travel ID + Bus Nr key
SEQ_NUM,
BUS_NUM,
AutoNumber(SEQ_NUM, %TIDBUSNR) as %TIDBusNRSeqOrder, //State A to State B Key for same Travel ID and Bus Nr
DRIVER_ID,
TRAVEL_DATE,
TO_STATE_ID,
FR_STATE_ID
Resident TravelDataTemp2
Order By %TIDBusNR,SEQ_NUM asc;
Drop Table TravelDataTemp2;
Drop Field SEQ_NUM;
Message was edited by: Sokol Petushi Noticed a typo in the posted script not related to the issue. %TRAVELIDBUSSNR should be %TIDBUSNR
Autonumber can be unacceptably slow when you have a lot of values for the second parameter -- AutoId. As a faster alternative, use peek like this:
if(%TIDBUSNR = Previous(%TIDBUSNR), peek('%TIDBusNRSeqOrder')+1, 1) as %TIDBusNRSeqOrder,
This assumes that your data is Ordered By %TIDBusNR,SEQ_NUM asc;, which it is according to the script you posted.
-Rob
Hi,
Do you think this will work?
TravelDataTemp2:
NoConcatenate
LOAD
TRAVEL_ID,
SEQ_NUM,
BUS_NUM,
AutoNumber(TRAVEL_ID & '|' & BUS_NUM) as %TIDBusNR, //Travel ID + Bus Nr key
AutoNumber(SEQ_NUM & '|' & %TRAVELIDBUSNR) as %TIDBusNRSeqOrder, //State A to State B Key for same Travel ID and Bus Nr
DRIVER_ID,
TRAVEL_DATE,
TO_STATE_ID,
FR_STATE_ID
Resident TravelDataTemp1
Order By %TRAVEL_ID, BUS_NUM,SEQ_NUM asc;
Drop Table TravelDataTemp1;
Since there are over a million rows, why not generate both the autonumbers in one load? Why even create a 3rd load (TravelData table).
The other thing I can think of is, unless having the order by is really essential, you should do away with it, autonumber will still work without the orderby.
Hi Karthik,
Thanks for your suggestion. The 2nd AutoNumber() needs to be ranked within the BUS_NUM dimension. A comma instead of a concatenation seems to resolve that issue.
With that said, the 2nd resident table part is to slow.
Autonumber can be unacceptably slow when you have a lot of values for the second parameter -- AutoId. As a faster alternative, use peek like this:
if(%TIDBUSNR = Previous(%TIDBUSNR), peek('%TIDBusNRSeqOrder')+1, 1) as %TIDBusNRSeqOrder,
This assumes that your data is Ordered By %TIDBusNR,SEQ_NUM asc;, which it is according to the script you posted.
-Rob
Thank you Rob,
Major difference. It solves the performance problem. It's now less than 15 seconds as compared to over 40' with AutoNumber().
Thank you,
Sokol