Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

Optimized LOAD - EXISTS, but no NOT(EXISTS())?

Hi,

I am trying to improve the performance of an app by trying to turn some LOADs into optimized LOADs and stuff.

I managed to take some time off the script-duration by replacing one RESIDENT_LOAD with a PRECEDING_LOAD and another one with a temporarily stored qvd_file - I might look at the possibility of having another PRECEDING_LOAD to replace that, too.

Now I have one more LOAD from a qvd-file with a filter applied, currently through an APPLYMAP - what this does is filter out several (currently two) values coming from an Excel file.

<=> The issue now is, an optimized LOAD from qvd allows an EXISTS() clause - but I want to EXclude the values in this Excel file - is there any possibility of inserting a NOT(EXISTS()) clause and still have an optimized LOAD or will I have to go the other way and first extract a listing of all possible values from somewhere and then INclude all of those (save those that should be filtered out)?

Thanks a lot!

Best regards,

DataNibbler

17 Replies
marcus_sommer

It sounds that there are multiple bottlenecks in your environment ... maybe you could merge some of the transformations into lesser load-steps and/or implementing any respectively more incremental approaches. But without using an appropriate multi-layer environment on modern hardware with local ssd-drives it will be far away from the potential performances.

- Marcus

datanibbler
Champion
Champion
Author

Hi Marcus,

I'm afraid you're right. We do have a multilayer environment here (there are extractors, several transform_steps, a datamodel_layer and finally the GUI) and I have a decent notebook here with 16GB of RAM, but I don't know about the server hardware and network - well, I can tell from the QMC that the QVS has 8 CPUs ... and I have the application locally, but not the data.

Not good ...

Well, I guess there's not too much I can do in that script. I'll have a look at a different one.

datanibbler
Champion
Champion
Author

Hmm ... the QMC tells me that this particular transformation step usually takes about 12 minutes to finish on the server - at night. That surprises me a bit because when I ran it locally on Friday, it took about 6 minutes (and it had to transfer the data back and forth).

Still, it seems worthwhile to look at that script once again as it seems to be the longest-running step in that entire report (there are two more transformations which take about 5sec each, a datamodel and a GUI which take about 2min). Sometimes, when looking at the same thing for the 100th time, you suddenly see something that you overlooked 99 times ...

Best regards,

marcus_sommer

There could be various causes to why a certain load-process is faster on one machine and slower on the other, for example the available resources, parallel running tasks, storage/network-speed, possible CPU frequencies in regard to single/multi-thread executions, the "age" of your hardware and probably there are some more reasons ...

- Marcus

datanibbler
Champion
Champion
Author

OK.

I'll close this thread now. From what I have seen so far, it seems that just about every app there is takes about twice as long to run on the server as compared to my local notebook 😉

Optimizing performance is not easy when every temporary STORE takes ages and saving space is not easy, either as all the data is required by various people to always be there to look into without spending any time reloading.

So, for the time being, it seems there`s nothing I can do.

evan_kurowski
Specialist
Specialist

There are definitely environmental considerations that may not be fully understood, or documented, that are capable of inhibiting QVD load optimization (even when all conditions conducive to optimization are present).  This may be giving some of our users false-readings (myself included).


I have seen QVD occasionally stop reporting optimization, even when read stand-alone with no transformation.

Also, as has been mentioned, using NOT EXISTS() with two different field name arguments is working optimized for me at the present moment.

But I wanted to use this very same technique last week, and using NOT EXISTS (with no transformation beyond an alias) and it was definitely deactivating the optimization.


[FILTER]:
LOAD FILTER FROM [FILTER.QVD] (qvd);

//on the latest test 4/16/2018, this syntax with two different field names passed as arguments to NOT EXISTS performed as optimized load
[FACT]:
LOAD TARGET_FIELD FROM [FACT.QVD] (qvd)
WHERE NOT EXISTS(FILTER, TARGET_FIELD);

//However, there have been many examples observed where the sole transformation was passing two different field names to NOT EXISTS and this deactivated the optimized load

 

There is something else at play here, something creating inconsistency in our end-user experience.  (not sure if this could be impacted by things that happen in earlier parts of scripts?  environmental issues?)

marcus_sommer

AFAIK applying a where exists clause with two parameters won't never result in an optimized load. At least I have never seen it whereby I didn't tried every possible combination and there might be something undocumented.

I think the important point will be if any data-processing on the row-level is necessary or not. This meant a renaming of a field for example is just applied to the xml-header on the column-level. Also on column-level happens the writing of the fieldvalues within the symboltables and only a new fieldvalue will be there stored - this meant there happens always an exists-check and therefore a where exists-clause will work.

But there are further dependencies for example if tables are concatenated both explicitly per statement and also by automatic concatenations if for instant statements like DISTINCT are included (which always effect all merging loadings): Turning Unoptimized Loads into Optimized Loads.

Beside this I could imagine that various settings like those in the easter egg like various caching-settings and also some configuration within the OS and/or VM might have side-effects.

- Marcus

evan_kurowski
Specialist
Specialist

Hello Marcus, those are good points identifying potential causes, originating from both syntax and external effects.

Checking for impacts from distinction, or multi-table contributions merging on a field makes sense.  It's effects from cache, vm, or os that have me concerned.  Sands could be shifting under the script's feet if those environmental settings weren't assured to remain static.

You always have great advice, thank you!