Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
awqvuserneo
Creator
Creator

QVD size bigger than 15 GB

Hi Folks,

Would you mind sharing your experience reading/writing big QVD files , i.e.: bigger than 15 GB?

Users would like to see data for all years (oldest data was from 10 years ago), and this QVD is being used across multiple dashboards.

Due to its size, we encountered several issues:

1. Extract layer: Read/Write QVD takes 30-60 minutes; average if 1x/week, the daily reload fails. It either says : "Generic Error" or "File is locked due to being used by system".

2. Transform layer: transforming this QVD also takes awhile, because of its size: 30-60 minutes

3. Dashboard layer: average of 1x/week the daily reload failed. (Since users want to see data throughout all years, it's a given that the dashboard size would be big and performance would be slow even if we manage to have it loaded).

I'm suggesting that we could:

a. Load the last 4 years into the main dashboard, and
b. Load data older than 5 years into separate dashboard (because only a small % of users need all data, because they want to do projection, & majority only cares about data from past 3 years).

Would you suggest a different idea, if the above doesn't sound like a solution ?

Thank you for sharing,

Anton

 

 

 

Labels (3)
1 Solution

Accepted Solutions
marcus_sommer

Quite often associate people by an incremental approach only the extract layer but not the transformation layer - and there might be in reality multiple ones even if the data-architecture is for practically reasons restricted to generator --> datamodel --> report. It's not seldom that on the extract data happens some mapping, flagging, adding missing records, making calculations, aggregations of measures and so on … Means strictly spoken there might be not just a 3-tier data-architecture else 6 to n layer and each one which is a bit heavier should be considered to be handled also with an incremental approach.

Beside this your mentioned loading-times seems a bit too long for optimized qvd-loadings. Therefore take a check if they are optimized and if not to adjust them appropriate (you may also need one or two changes in beforehand for it to ensure that there is max. one KEY which could be applied in a where exists() and no other transformations are needed which may break the optimized load-logic).

Further make sure that you load the biggest qvd's at first and the smaller ones - means at first the historical data and then the current data (of course all are loaded optimized). My experience with larger datasets (which aren't so big as yours) it makes a significantly difference in which order the qvd's are loaded - especially if for any reasons not all could be loaded optimized.

There may be also another way to reduce the loading times by loading the reports binary from another report which contains the data until yesterday and adding then there the current data from a qvd - and somewhere over night within a maintaining window you refreshed the binary.  At least by QlikView it's a possibility by Qlik Sense I'm not sure if it supports binary loads in the same way.

- Marcus

View solution in original post

8 Replies
joaopaulo_delco
Partner - Creator III
Partner - Creator III

Hi @awqvuserneo !
       
             Sometimes I face this kind of issue. See my tips below:
- For the extract app, I usually separate the data by month-year qvd's. The first time I generate the qvd's for all period (name_201801.qvd, name_201802.qvd...), and after that I just load and store the last two last months on the daily extract task. 

- If you are using Qlik Sense, I suggest you study the ODAG tool. With ODAG you develop a base filter app that generates a filtered sub-app with only the chosen data.  https://youtu.be/n_VnOdqQ0CA

- Another alternative could be using the dynamics' views. It's similar to ODAG, but you do all in the same app. Look the link below. 

https://youtu.be/LRrMDW7qUok

 

 

Help users find answers! Don't forget to mark a solution that worked for you!
manoranjan_d
Specialist
Specialist

if qvd size is huge then make the server or system configuration with high end example ram should be 400 gb.

 

marcus_sommer

I suggest to slice the data at least on a year level or probably more suitable a year-month level (even if it increased a bit the overhead to handle more loadings) to reduce the risks to get in conflict with any parallel loadings and/or an access from the OS, network/storage, any security-tools/measures and similar.

Further you need such a method to implement incremental approaches because it's quite unlikely that the old data change in any way and therefore they don't need to be processed each time. Incremental loading means not only to apply it on the extract - level else on each transformation level.

Also I recommend to consider to use facts with different granularities - means something like the last year is available on a record-level, the previous two years is aggregated on a product level ad everything older on a category + yearmonth level. It's quite seldom that the business needs old data on an atomic level because those products doesn't exists anymore and/or not really comparable against the new one ...

Beside this all I suggest to review the data itself - are really all included fields needed? Contain they high cardinality fields (like record-ID's from databases which aren't useful in reports or any timestamps which could be splittet in dates and times and similar stuff)? Exists any record-level formatting - if yes replace it with a column-formatting or setting the format within the dashboard.

- Marcus

awqvuserneo
Creator
Creator
Author

Hi @joaopaulo_delco ,

Thank you for your suggestion. We did similarly; The big QVD is actually a concatenation of smaller QVD from each year ABC_2010.qvd, ABC_2011.qvd, ... , ABC_2021.qvd.
Our original approach of using 1 giant QVD is it will need to be consumed by several QVF & having 10 smaller QVD to be processed by several Apps is kind of troublesome (though we understand its benefit).

Also, thank you for sharing the ODAG and/or dynamic views. I can see this is helpful to speed up analyzing the data once you narrow it down to a smaller data set.

Best regards

-A-

awqvuserneo
Creator
Creator
Author

Hi @manoranjan_d ,

Thank you for your suggestion.

It definitely helps to process such big QVD with big RAM & speedier hard drive.

Though we're still trying how to properly reading/storing big QVD itself before beefing up the resources.
I wonder how folks in big data space uses Qlik to read/store QVD as 15GB QVD is probably relatively small in comparison.

Best regards

-A-

awqvuserneo
Creator
Creator
Author

Hi @marcus_sommer ,

Thank you for your suggestion. We did similarly; The big QVD is actually a concatenation of smaller QVD broken down by year: ABC_2010.qvd, ABC_2011.qvd, ... , ABC_2021.qvd.

We did it 2 ways:
a. Generate past 3 years of QVD (daily) & concatenate it -> in average, takes 30 minutes to generate 3 years of QVD & another 30 minutes to concatenate 10 years of QVD into 1 big giant ABC.QVD (failure happened during the latter);
b. Do incremental on ABC.QVD (daily) -> 10 min to query the most recent changes + 30-40 minutes to append it back to ABC.QVD (failure happened to the latter).

Note: We generate past 3 years daily because there are historical data (2019, 2020, 2021) that still receiving updated.

Using both approaches, we're still experiencing failures averaging 1x/week.
It kinds of makes sense, that the bigger the QVD, the more likely it could fail due to combination of: network hiccup while writing QVD, bad disk drive, & other QVF failed trying to read this big QVD which interrupt writing process.

Our original approach of using 1 giant QVD is it will need to be consumed by several QVF & hope by building 1 QVD first, other dashboards can benefit from having "optimized load" when consuming it.

I'll see if we could improve it by having different granularity per your suggestion.

 

By the way, would you mind explaining a bit more on "Incremental loading means not only to apply it on the extract - level else on each transformation level".

 

Thank you again for your feedback & suggestion.

Best regards

-A-

 

 

 

 

 

marcus_sommer

Quite often associate people by an incremental approach only the extract layer but not the transformation layer - and there might be in reality multiple ones even if the data-architecture is for practically reasons restricted to generator --> datamodel --> report. It's not seldom that on the extract data happens some mapping, flagging, adding missing records, making calculations, aggregations of measures and so on … Means strictly spoken there might be not just a 3-tier data-architecture else 6 to n layer and each one which is a bit heavier should be considered to be handled also with an incremental approach.

Beside this your mentioned loading-times seems a bit too long for optimized qvd-loadings. Therefore take a check if they are optimized and if not to adjust them appropriate (you may also need one or two changes in beforehand for it to ensure that there is max. one KEY which could be applied in a where exists() and no other transformations are needed which may break the optimized load-logic).

Further make sure that you load the biggest qvd's at first and the smaller ones - means at first the historical data and then the current data (of course all are loaded optimized). My experience with larger datasets (which aren't so big as yours) it makes a significantly difference in which order the qvd's are loaded - especially if for any reasons not all could be loaded optimized.

There may be also another way to reduce the loading times by loading the reports binary from another report which contains the data until yesterday and adding then there the current data from a qvd - and somewhere over night within a maintaining window you refreshed the binary.  At least by QlikView it's a possibility by Qlik Sense I'm not sure if it supports binary loads in the same way.

- Marcus

awqvuserneo
Creator
Creator
Author

Hi @marcus_sommer ,

Thank you for explaining & suggesting in great detail.

We haven't tried loading the big QVD yet at first; Right now it's somewhere in the middle of the scripts.

We'll give it a shot and see if could make improvement that way.

Again, thank you for sharing your Insights.

-A-