topic Re: Many Small QVDs vs Few Large QVDs - Which option is better, in App Development

Many Small QVDs vs Few Large QVDs - Which option is better,

LuisGalvan — Tue, 11 May 2021 00:05:05 GMT

Hi Community,

I am in this dilemma as our Qlik Sense environment is growing fast, and we tend to optimize Datamodel and QVD files created by us (Qlik Admin), so Qlik developers have the best experience.

1. Which option will be better, and why is that?

Many small QVDs or Few large QVD files. Ex: let's say that we have 30M records dimensionally split, and we have the option to create groups of 12 QVDs (1 per year) or 60 (1 per year and country). Either concept will use QVD optimize to load QVDs, but not sure if load time will make a difference using either concept.

2. Is there any documentation available that can explain this load optimization process?

Thanks for the help

Re: Many Small QVDs vs Few Large QVDs - Which option is better,

Dalton_Ruer — Tue, 11 May 2021 14:04:14 GMT

I can say with all of my experience the answer is "It Depends."

How often are you pulling new data?

Are you doing it via Incremental Loads or simply re-extracting everything?

Can historical data be changed or is it simply new data?

Will all your applications needs all of the data or will each need different amounts of the data?

As a coder, and SAN Administrator, I don't like wasting any more I/O than I have to. If historical data can't change then there is no sense in loading 30 million records from a giant QVD and then rewrite it because 10 records were added to it.

If most of the applications will only load subsets of the data, like rolling 13 months, then keep the data monthly.

If most of the applications are only going to load data for 1 country, then keep the data at the country level.

But don't create more work for yourself, and coders than necessary either. No reason to touch 60 different QVD's for incremental loads if you don't really need to.

Re: Many Small QVDs vs Few Large QVDs - Which option is better,

marcus_sommer — Tue, 11 May 2021 14:56:54 GMT

Each new load creates mandatory a certain amount of overhead to communicate with the OS to get access to the storage, read the xml meta-data and initialize the tables and fields. This means splitting the data into n qvd's instead of a single one will increase the load-times if you load them all.

But the essential question is is this overhead significantly or rather not. This couldn't be answered in general else it will depend on various factors, for example the speed of the storage/network related to the amount of data and the number of qvd's and if the data are loaded optimized or not.

Personally I think you will measure a difference if you load the same data from 12 qvd's or from 60 qvd's but I wouldn't expect a significantly impact - whereby I never compared both approaches directly against each other. In cases like yours I would expect a difference lesser as 10% because the factor between the loads is 1:5. It would be quite different if the qvd's aren't on a yearly-level else on a daily-level which means there would be more than 4k of loadings - here would the overhead increase more heavily in regard to the loaded data.

Beside to the above you may not always want to load all data and then it would be very useful to have them separated in appropriate qvd's - especially if you put those information into the filename - because then you could use a dirlist() + filelist() loop to scan the storage and picking the relevant files from there without the need to access the files directly and filtering the data. Very important is here that you could have only a single where exists(FIELD) to keep a loading optimized. If you now want to filter on years and certain countries - you have already two conditions and you may need some more for example for any further key-field to apply another incremental logic.

- Marcus