QVD optimization and hashing - Qlik Community

Report Inappropriate Content · ‎2011-10-04

Hi all,

I'm looking at some options for improving performance of my QVW and was hoping someone may be able to help with the subject of QVD optimization.

There main area of performance I'm interested in here is performance of the QVW that consumes the optimized QVD. I'm guessing the structure of my application is similar to many others out there. I have a bunch of QVW's whose purpose is to generate QVD files. These QVD files are then loaded into my main dashboard QVW.

Question 1:

I understand that performance of the load in to my dashboard QVW is improved when the load is qvd optimized, and that operations which result in any sort of calculation of data in the QVD will result in the load not being qvd optimized, but is there any impact on performance within the QVW when users are browsing it?

The reason I'm asking this is that my loads are not all optimized because I am using autonumberhash128 to create integer keys. This doesn't seem to really generate a hash, it generates a sequential integer. Fine when all the data sets being linked are in the same QVW, though not so fine when using the hash function in various different QVW's.

Question 2:

Is there any way in Qlikview of generating a real hash? So that the function hash(abc) would generate the same results when embedded in different QVW's.

If not, then perhaps I could amend my load to do something like this (taken from here - http://qlikviewnotes.blogspot.com/2008/05/when-less-data-means-more-ram.html 😞

fact:

//qvd optimized
LOAD * FROM fact.qvd (qvd);

RIGHT JOIN

LOAD DISTINCT *, autonumberhash128(key) as %key
RESIDENT fact;

Thanks,

Marcus

Miguel_Angel_Baeyens · ‎2011-10-04

Hi Marcus,

If you cannot create the numeric key field in the same document because you have several QVD creators (several QVW files used to populate those QVDs), think first of joining all of them into one big QVD generator file, if possible. That allows you to create your data model in one document, and not even using QVDs in the dashboard document rather than a binary load. In terms of reload performance this is probably faster.

But your data will be fine irrespective of your data source, once you have loaded it. If your data volume is huge you can tweak the document not to use compression (Settings menu, Document Properties, Compression set to None), this may render a bit faster but will use a lot more disk.

I don't see much sense in the code you suggest above for the current version 10 of QlikView, since you are loading twice, and usually resident loads are slower than a unoptimized QVD load (which usually is faster, although not the so called "superfast" mode). Anyway, it's a two step load, while you can get the same in one step:

fact:
LOAD *,
      autonumberhash128(key) as %key
FROM fact.qvd (qvd);

Performance depends highly on your hardware and the complexity of your expressions and charts, but separate the load script optimization and time from the chart rendering. In general, the cleaner your model is (i. e.: the more similar to a star with a fact table with some dimensions and one hop between tables) the better it will perform.

As a general rule in QlikView, denormalize your data if needed in order to get that star schema (for example, have a big Customer table with all its families, groups and so instead of a Customer table linked to a CustomerGroup table linked to a Region table linked to a Continent table), since a table with more columns usually performs better than more tables (think of a snow flake model) with less records. In my experience, the farther you go from your fact table to the dimension table the poorer the chart and visualization performs. This usually means longer load time, because you will have to make more transformations, aggregations and joins in your script. But the user experience will be smoother.

Again, if your model is complex and your charts have a lot of dimensions and expressions, try different approaches, there are best practices as guidellines but not "the solution".

Hope all the above makes sense.

Miguel Angel Baeyens

BI Consultant

Comex Grupo Ibérica

rwunderlich · ‎2011-10-04

Question 1:

Yes, an unoptimized load can affect the runtime performance of your qvw. Unoptimized loads frequently cause numbers to be converted to strings, which generally will perform slower in the qvw.

Question 2:

You can generate real hashes with the functions hash128(), hash160 & hash256.

-Rob

http://robwunderlich.com

Report Inappropriate Content · ‎2011-10-05

Hi Rob,

thank you for your reply.

Looking in my application at the moment I have several unoptimized fact loads.

At the moment my fact loads look something like this:

fact: LOAD key,

factid,

date,

timehours,

timecost,

agedays,

if(agedays>=90,1,0) as BadDebt,

autonumberhash128(key) as %key FROM fact.qvd (qvd);

Now, would this just result in the agedays and key field beings converted to strings, or other fields as well?

Marcus

Report Inappropriate Content · ‎2011-10-05

Hi Miguel,

thanks for your reply.

Unfortunately combining my fact loads isn't particularly feasible as this would result in one incredibly slow data load rather than several loads that can be run in parallel.

Based on your comments I'm looking at trying to denormalize some of my data.

Marcus

rwunderlich · ‎2011-10-05

Only way to tell for sure what fields get converted is to create mem files and look at the field sizes of an optimized vs unoptimized load in QvOptimizer. I've never fuilly understood what the "rules" are on this.

By the way, the field conversion on an unoptimized load is accepted by QT as a bug. I don't know when it will be fixed, but I'm hopeful for V11.

-Rob

Report Inappropriate Content · ‎2011-10-31

Question1: Autonumber() and AutonumberhashXXX() work exactly by the same algorithm. First - find distinct set of values for parameter expressions. Second - generate RowNo() as the result of the function. That's why you have same ID, let's say ID=1, in different QVDs for the different combinations of parameters. It's just the first distinct value of hash subsets. If you want to create unigue IDs across many QVDs you should use hachXXX() functions. Yes, it's better to generate and store these hash IDs in QVD. Then your qvd load will be always optimized.

Question2: hash128(), hash160(), hash256()