Solved: Re: How much potential for optimization would ther... - Qlik Community

datanibbler · ‎2018-05-20

Hi,

I have said in previous posts that it's very tricky to actually measure the effect any optimization has on the execution time of any script as there are too many other factors influencing it.

I have copied some working data onto my local laptop for testing. Here of course everything is very fast since there is now no need to transfer the data back and forth across a network - well, there is no need to do so on the server where everything is executed.

The specific app which I'm trying to optimize now is a script - a transformation_step without GUI - where a number of "delta_LOADs" from an SAP table are loaded, one for each day, currently about 450 - since January 2017. I should think there is some potential for optimization here as every qvd must have some overhead, no? I'm thinking about making all the qvd's for 2017 into one and concatenating those for 2018.

=> Is it possible to say in general terms how much overhead there is in every qvd - that would multiply when, instead of 360 or so qvd's for 2017, there was just one ...

Best regards,

DataNibbler

Peter_Cammaert · ‎2018-05-21

That's a difficult question, and the answer may depend on how large each qvd is. You see, the real overhead will become marginal as the data content grows.

IMHO loading a single QVD instead of 450 component-QVDs will always be faster, because

the script code doesn't have to loop (famously slow) and a single optimised LOAD will do
there is only one set of symbol tables to load, instead of 450 sets that need to be merged
memory allocation is more efficient (QlikView will know beforehand how much memory is needed to load the big guy)
etc.

but these are just theoretical reasons. If we take into account that some systems may use different I/O channels to stream your 450 qvds in parallel, and that for some operations you may not even need all data in memory at a time, then we could easily compose a list of counter-arguments.

In short: I think you should perform some tests on a representative subset of your data (for example, load just 45 QVDs) and see what works best for you.

View solution in original post

Peter_Cammaert · ‎2018-05-21

That's a difficult question, and the answer may depend on how large each qvd is. You see, the real overhead will become marginal as the data content grows.

IMHO loading a single QVD instead of 450 component-QVDs will always be faster, because

the script code doesn't have to loop (famously slow) and a single optimised LOAD will do
there is only one set of symbol tables to load, instead of 450 sets that need to be merged
memory allocation is more efficient (QlikView will know beforehand how much memory is needed to load the big guy)
etc.

but these are just theoretical reasons. If we take into account that some systems may use different I/O channels to stream your 450 qvds in parallel, and that for some operations you may not even need all data in memory at a time, then we could easily compose a list of counter-arguments.

In short: I think you should perform some tests on a representative subset of your data (for example, load just 45 QVDs) and see what works best for you.

datanibbler · ‎2018-05-22

Thanks Peter!

Well, I have done some "testing" on my local laptop - taking the time on a copy of the real data.

It's only about 1.000 rows in one qvd, some more, some less - depends on how much has been done that day 😉

Indeed, putting all Deltas from 2017 into one (using FIELDNAME() since the exact date is in the name of every one saves a lot of time, especially because the whole bunch of Deltas is for some reason loaded twice, concatenating, with slightly different filters ...

Because of the filters I guess there are two options: The easier one would be to just merge all 2017 Deltas into one with no filtering so far - so the script can be left unchanged (just loading every qvd available in that place with an *), but the filtering will still have to be done, so the LOAD won't be optimized.

<=> The other option would be more complicated since the Deltas from 2017 can then be loaded optimized, but those from 2018 will still have to be filtered.

Since the time I spend to implement these optimizations is also an issue, I think I will go for the first option.

Thanks a lot!

Best regards,

DataNibbler

datanibbler · ‎2018-05-22

Optimizing the LOAD without altering the script in any way has another charm: I can't be sure that no other script makes use of these Delta-LOADs as they are in a "global" directory that can, in principle, be accessed by any of our apps. But if there is no need to alter the script and that " LOAD * " will still do, there can't be any error anywhere else.

How much potential for optimization would there be in theory?