Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a question rgd. LOAD times for one specific table:
- The table is very big, but on one field it can be split: One part (only one value in that field) has about 5% of all records, the other part (all other values in that field) has about 95% of records.
- There is nów only one filter splitting the table in two: >> TRANCODE = 'SPRECEIVE' << and >> TRANCODE <> 'SPRECEIVE' <<
<=> The first part (LOADed from a qvd file) takes much longer to load than the second one.
These are the two loads: FIRST LOAD:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Trans_Dtl_Rec:
LOAD
ITEM_NUMBER as Rec.ITEM_NUMBER,
TRAN_DATE_TD as Rec.TRAN_DATE_TD,
TRAN_DATE_Time as Rec.TRAN_DATE_Time,
TRAN_CODE as Rec.TRAN_CODE,
QUANTITY as Rec.QUANTITY,
BIN as Rec.BIN,
TRACKING_NUMBER as Rec.TRACKING_NUMBER,
CONTROL_NUMBER as Rec.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Rec.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE TRAN_CODE = 'SPRECEIVE';
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
SECOND LOAD
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Trans_Dtl_aside:
LOAD
ITEM_NUMBER as Aside.ITEM_NUMBER,
TRAN_DATE_TD as Aside.TRAN_DATE_TD,
TRAN_DATE_Time as Aside.TRAN_DATE_Time,
TRAN_CODE as Aside.TRAN_CODE,
QUANTITY as Aside.QUANTITY,
BIN as Aside.BIN,
TRACKING_NUMBER as Aside.TRACKING_NUMBER,
CONTROL_NUMBER as Aside.CONTROL_NUMBER,
SERIAL_LOT_NUMBER as Aside.SERIAL_LOT_NUMBER
FROM $(v_Pfad_TransDtl) (qvd)
WHERE TRAN_CODE <> 'SPRECEIVE';
=> Is there any way I can speed that up?
Thanks a lot!
Best regards,
DataNibbler
Hi DataNibbler,
the logic behind where exists() is to compare the fieldvalues from the load-table against the list of values which already exists within this field - an this happens for each single record within the load and each new fieldvalue will be added to this field while the runtime (this happens in general unless there is a condition against it).
This meant you need to prepare this field with the appropriate fieldvalues (which you want to filter in or out) before you could it within the load-statement. For this reason we created the table "Dummy" above with the needed fieldvalue and fieldname - but if this field already exists it could contain fieldvalues which don't suit to your requirement. In this cases you could (often) use exists with a second parameter by loosing the optimized load or to create an additional temp-field within the qvd or like in my suggesting to rename the field.
In each case this will add some overhead and complexity and if your script is already quite complex it's not always very easy to implement it. Perhaps it would be useful if you split your load-approach into several applications respectively more layer within data-arcitecture to simplify your loadings.
- Marcus
Okay Marcus,
thanks again! I still haven't fully understood - mainly because in your suggestion you use the RENAME statement before the field to rename is even loaded - but never mind: You say it would add complexity, and I am as a rule trying to avoid making things too complex so anyone else could, without being an expert, with reasonable effort, understand and maintain my applications. So I think, I will still try to understand this for knowledge's sake, but for the users, I will go for the inbetween_way of using EXISTS with two parameters and implement the partial_reload scheme instead.
That inbetween_way is still faster than loading normally, I think - as of now, that is, because we "restarted" that database in June, there is not so much in there, and I had to take out the date_filter to enable this - it might become slower again going forward, but I don't mind right now.
I just noticed another issue with that - somewhere, a copy of a table or something is generated, generating a large synthetic key. With just 10 records, that's not really an issue, I'm just testing what happens if I do several partial reloads in a row ...
New issue:
It's pretty clear what's happening: One table is now loaded within a loop - I use the REPLACE keyword in the first iteration and ADD afterwards - there are 10 iterations in the instance I'm testing.
=> After that loop is closed, there comes a LEFT JOIN from another table adding 3 more fields to the table from the loop - and at that point, a copy of that table seems to be generated, resulting in an enormous synthetic key.
Seemingly, even when there are several partial LOADs in a row, 1 copy of that table is all I get, that's not too bad - but still I don't quite understand it.
Oh - that happened before I intróduced the use of two different keywords for this loop, so it is not because of that ...
Do you have any idea what could cause this?
I will now go on lunch_break and then start testing this in the Debugger ...
Thanks a lot!
Best regards,
DataNibbler
It's quite difficult to give you an advice than your script is quite large and complex - maybe too complex and it would be useful to split it and maybe removing your partial-load approach and trying to speed up your load-times with other methods.
Personally I don't use partial loadings and it would be my last step in improving the load-times if all other methods creating the datamodel and forcing of optimized loads, incremental loadings, modifying binary loads and so on aren't fast enough.
- Marcus
Hi Marcus,
maybe you're right.
I will keep the script where I have already implemented the partial_reload - it works as such, so it would be a shame to just bin it, and maybe redo it lateron - but I will try to optimize whatever else I can before I use that as it does bring with it the possibility for the user to do it wrong, which I always try to avoid ... since the tables used for that report are one and all only reloaded from the database once an hour, a partial reload doesn't seem a bad idea - but maybe I can create one "inbetween" qvw (between our main loading_scripts running once an hour and this app) that would load all the necessary tables so that can be loaded BINARY ...
There are some possibilities that I can think of, remoing JOINs by moving them "further up the road" into the first tier of our installation and store the already-joined tables as qvd and such.
To return to where we started, kind of - an EXISTS()-clause is always faster than a simple filter, though not as fast as an optimized LOAD, is that about right?
Hi DataNibbler,
nothing is faster as an optimized load - but I haven't tested yet if there are significantly differences by not optimized loadings on how the filtering is applied. I would assume the following order:
- where exists(Par1, Par2)
- normal where-clause like date >= ....
- joins
whereby it will depend on the concrete scenario.
- Marcus
Hi Marcus,
that's what I thought, and it confirms what I assumed: An EXISTS clause is generally always faster than a specific WHERE filter.
I will try to load all necessary tables in an inbetween_app running "in the wake" of our primary data_loading script (I would need two of those, one for each database we have) and load that BINARY in the actual report.
Plus, I will try to replace the filters by EXISTS clauses in that report wherever possible.
Best regards,
DataNibbler
P.S.: Still - I don't understand that about the optimized LOAD with EXISTS (one parameter) - why did you place that RENAME command before all LOADs in your suggestion - what does it do there, to what end did you do that? I do understand why it would be necessary to rename that field - but that would have to happen immediately after loading every single record, no?
Hi DataNibbler,
your field within the load will be TRAN_CODE and this will be compared against an already existing field of TRAN_CODE and which needs to be checked/created/prepared in beforehand. Your renaming with an alias to Rec.TRAN_CODE isn't important here and you couldn't use these renamed field anywhere within the current load-statement as reference - it does exists only after the load is finished.
- Marcus
I still don't get you 😉 Sorry.
So you think my renaming as Rec.Tran_Code would already be enough since that alias_name only exists after the entire LOAD is finished?
Then, would I not have to load the field AS Rec.Tran_Code in that Dummy_table?
I still cannot fit in your renaming into TRAN_CODE_TEMP - because it doesn't figure anywhere in the LOADs until you undo that after all the LOADs are done ...
Hi DataNibbler,
no, I didn't mean that. We should now forget Rec.Tran_Code because it's not involved. The renaming before this qvd-load from Tran_Code to Tran_Code_Temp is only necessary if this field already exsists and don't contained the proper values. If Tran_Code don't exists before you could remove the rename-statement - it should be only an example of methods which could be useful to prepare the fields/data to be able to perform the load in an optimized load.
- Marcus