Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
jackm75
Creator
Creator

Qlik Cloud QVD Production Bloat After Incremental Reload

Hello all, I've searched but haven't found anything recent or related to SaaS, only QlikView on this topic. I have a Qlik Cloud app that pulls data and produces a QVD for app consumption. The first run of this QVD creates a file that is about 750MB in size for the 6.5M records it's holding. From there, I have it incremental loading to just add the new records based on the primary key. After the first incremental reload, the number of rows goes up by just a few thousand records as expected, however the file size bloats up to over 1.3GB for no apparent reason. 

Old forum posts for QlikView suggest it has to do with xml headers for linage and turning off the Allow Data Lineage in the QlikView app would handle the issue. I'm not seeing such a setting in Qlik Cloud. I did blindly attempt to use in the load script Set AllowDataLineage = 0; in the load script but no change. 

Anyone else seeing similar issues who have found a resolution? 

Thanks

Labels (1)
  • SaaS

6 Replies
marcus_sommer

By just a single incremental load the data-lineage shouldn't be an issue because it adds just one extra entry within the meta-data. Therefore a daily incremental load would even by several years not add a significantly amount of data. AFAIK the mentioned issue happens only if the refresh happens within loop-approaches which may add several dozens/hundreds single/grouped records during an incremental load.

Therefore it's more likely that the reason is anywhere else - maybe if any data-interpretation changed from a column-level formatting to a row-level formatting. But before looking in such direction I suggest to investigate the qvd's directly in separate applications - the first + second as well as the n ones - as qvd's and maybe also as xml-files and/or looking per editor in them.

jackm75
Creator
Creator
Author

@marcus_sommer thanks for the quick reply. 

I tested with a smaller sample of the same dataset.
In the first run, it produced a QVD with 896,754 rows and 38 fields with file size of 99.05MB.
I then created a second QVD with the same so I could incremental load this one without impacting the first for comparison.
I reloaded again to perform the incremental reload which at this time did not add any additional rows since there is not yet any new data to add. However, the file size is now 179.17MB. Looking at the field metadata of the two QVDs, they are identical. 

marcus_sommer

Never worked with the cloud yet and therefore I don't know how the things are shown there. What happens if you repeat this task - does it always adds a large amount of size without adding or changing the data? Maybe the cloud-view doesn't show the real file-size else the space-reservation with caching n versions. How does the file-size look like if you download them into a local storage?

jackm75
Creator
Creator
Author

After the first incremental reload, the file size only increases slightly as expected based on the new data being pulled in. For example, in my test I ran a second incremental reload which pulled in an additional 3,542 rows for a total of 900,296 rows. Now the file size is 179.51MB.

So, it's only a large jump on the first incremental load. 

jackm75
Creator
Creator
Author

Regarding what the file looks like if downloaded, unfortunately it's not possible to download a QVD file from the cloud to check. 

marcus_sommer

It's a general (and IMO big) disadvantage of cloud-approaches that there isn't a classical storage anymore - at least not from a user-perspective because there is always a proprietary onion-layer between the user and the storage ...

Beside this what happens if you load another qvd (any re-named or a new created one) without any incremental logic and stored it again by applying different measures:

  • optimized - and the source-file overwritten
  • optimized - stored with a new file-name
  • not optimized - with a where clause like 1 = 1 (also overwritten or a new one created)
  • not optimized - with formatting/rounding a column and/or duplicating one
  • all the above also with second/third attempt

Are there really differences in regard to the file-size and if when and how much are they? Maybe any valuable hints occur ...