Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

QVD File size much bigger than excel source files

Hi All,

Hoping you can help with a problem I'm seeing. 245

I have an extract from a Hyperion Essbase cube that would require an excel file size of 245Mb to hold the data. I need to save my raw data file to SharePoint, but have a file size limitation on SharePoint, so I've split the excel file into three separate files.  Each file is approx. 80Mb in size. I can now load my data to the share point. However, as the reload time was very, very slow, I decided to generated QVD files from the Excel files This is where the issue arises..... the QVD files generated are much larger than the raw excel files. For example, an 87MB excel file needs a 52Mb and a 93Mb QVD file. I needed to create two QVD's to keep the individual file sizes below 100Mb.

Any ideas why the QVDF file sizes are so much larger that the Excel file? The Excel file is very densely populated. I was expecting the QVD file to be smaller, or at the very least, of equal size to the raw data file... (adjusting for header data etc).

The Excel file is loaded using Crosstable and the Store command is as follows....

Store FACTS into

\Workbooks\New Data Structure\POR 2H13_1.qvd;

DROP Table FACTS;

All thoughts/help appreciated


8 Replies
Chris_Rice
Support
Support

"if the source file was small, then the QVD could be larger, due to the XML header in the QVD file. Also, the QVD file size depends on the number of distinct values for each field, if this is large (eg distinct values in every record, and that do not compress well), then the QVD could use at least as much (or more) space as the source file.

Regards.

Siva"

REF: Re: qvd file larger than original csv file after simple load

Not applicable
Author

Thanks for the reply Chris.

In this case the source file is quite large so the XML header would not really have an impact. What I am seeing is that the STORE generated QVD is about twice the size of the QVW file. I think the issue is related to the CROSSTABLE command that I am using. I have a cross table command as follows:

  

ForecastTmp:
CrossTable(ShipDate, Data, 7)
LOAD 
    
Technology,
    
Market,
    
ForecastVersion,
    
Measure,
    
Region,
    
Printer_Name,
    
PenName
    
[201001],
    
[201002],
    
[201003],
    
[201004],
     .........

     [201712]


I then take this and load to my FACTS table as follows:

Load
  Data,
  PurgeChar(ForecastVersion,' ') AS [ForecastVersion],
  Technology,
  Market,
  Measure,
  Region,
  Printer_Name,    
  PenName,
ShipDate,
Date(Date# (ShipDate , 'YYYYMM'),'MMM-YY') as Date

Resident ForecastTmp;
Drop table ForecastTmp;

STORE FACTS into C:\Users\johnmu\Documents\QlikView\QlikView Developer\QVD\Current and POR QVDs\FORECAST.qvd;

In every case, the size of the ForecastTmp file is nearly exactly half the size of the FORECAST QVD file. What seems obvious to me is that in some way I am doubling up the size of the stored QVD file from the Crosstable file. But I can't work out how.

Hope you can provide some insight

Regards

John

Gysbert_Wassenaar

If your excel file is in the OOXML format then it will be a zip compressed xml file. A qvd file is basically a memory dump of a table from the in-memory qlikview database. A qvd file is not compressed. A qvw document can be compressed. The level of compression can be set in the document properties.

Using the CrossTable function creates a table with more rows and less fields than the source. This can lead to a table that takes up more space. The cardinality of the fields plays a major role in this. See Symbol Tables and Bit-Stuffed Pointers for more information.


talk is cheap, supply exceeds demand
Not applicable
Author

Hi Gysbert,

Thanks for getting back so quickly.. The excel file is in XLSM format (I have a macro in it). The data in my excel file is very distinct with very little repetition, the files are quite dense. I read Henric's file and don't think I'm generating massively extra vols of data due to pointers. Would this conceivably lead to a doubling in file size as a result of a STORE command?

Thanks

John

Gysbert_Wassenaar

The most likely cause is that the excel file is stored as a compressed zip file and the qvd file is not compressed. If you change the extension from .xlsm to .zip and extract it to a folder you can compare the uncompressed sizes. Or zip the qvd file.


talk is cheap, supply exceeds demand
jonathandienst
Partner - Champion III
Partner - Champion III

Hi

Its not the STORE command. I agree with Gysbert's second point. The cross-table  is creating many more rows, so you are not really comparing apples with apples. To test this, try storing the data in qvd(s) without the cross table.

If the unique values for the fields are evenly distributed, then you will be storing a lookup table once for each file. The ZIP compression employed by ooxml files (which includes the xlsm extension) may handle the this structure better than the lookup table compression employed in QVDs.

HTH

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
Not applicable
Author

Hi,

You are right.. thanks for the education The XLS version of the XLSX file is 157mb vs 77mb (the un-zipped file is nearly 500mb!)

. Interestingly though, why is the qvd file closer in size to the xlsx file? It's only when I use the STORE cmd that the file size balloons. Looks like I will have to get creative with the data I'm generating....

Thanks

John

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Are you measuring the size of the ForecastTmp by saving and looking at the qvw filesize? QVWs are (usually) compressed. QVDs are not. More here:

Qlikview Notes: Document Compression

-Rob

http://masterssummit.com

http://robwunderlich.com