Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
diwakarnahata
Creator
Creator

Application Size inflates with new DB

Hi All,

I am facing a strange issue where i am running the same QlikView script in two Databases (Vectorwise and Redshift), on Vectorwise the qvw file size is 3.5 GB and on Redshift the file size inflates to 8.5 GB.

We are loading 24 months of rolling data in the main fact table, and the # of records for these 24 months are 10% higher in Redshift database. But, that should not inflate the file size by more than twice.

Attached is the export of the .mem file of the two application executed on the Vectorwise and Redshift respectively.

Also, we checked the data types of the 2 main fact tables and 1 relate table which are voluminous, and they are mostly the same between the two databases.

There are serious concerns over the scalability of the application due to this issue.

Can someone suggest, how to debug this issue.

Regards,

Diwakar

15 Replies
Gysbert_Wassenaar

Comparing the totals of the two .mem files shows only 8% increase for the Redshift data.


talk is cheap, supply exceeds demand
diwakarnahata
Creator
Creator
Author

Hi Gysbert,

Yes, but the qvw file size is more than double for Redshift DB.

What could be the reason for this?

Also, is the total size for QlikView (as per .mem file) directly proportional to the size of the qvw file?

Regards,

Diwakar

simenkg
Specialist
Specialist

You probably have more distinct values in the Redshift DB. In memory this does not matter as much since a table box or chart has to be calculated and kept in memory anyway. It does however matter to the comression rate the .qvw can achieve.

Gysbert_Wassenaar

I'd expect a significant difference with regards to the symbols then. I don't see that in the excel file he posted.


talk is cheap, supply exceeds demand
diwakarnahata
Creator
Creator
Author

Hi Gysbert,

I did some more investigation into the issue and had the below observations:

1. The issue of file size inflation was coming only for the fact tables. When only the dimension tables were loaded the file size was within range of 10%

2. The issue was coming for almost all the columns of the fact table

3. When a distinct clause was used while testing this issue for few columns of the fact table, the size was within limits of 10%

4. Without the distinct clause the file size went upto 6 times for almost the same number of records and same number of distinct records.

5. I tried exporting the table to qvd and reloading, but it didn't help.

6. Using DISTINCT with all the columns of the table resulted in increase in total file size of the qvw.

7. I loaded only one column from the fact table for one day from the two sources whose count differ within 10% and found that the file size with Vectorwise DB was 3.05 MB and with Redshift it inflated to 8.75 MB. I also checked the #Values and #Distinct from the Document Properties>Tables and they were within 5% variance.

Any idea what could be causing this issue? Let me know if you have any suggestions to try out.

Regards,

Diwakar

jfkinspari
Partner - Specialist
Partner - Specialist

Make sure you are using the same compression(Document Properties) for both files, when comparing the size.

diwakarnahata
Creator
Creator
Author

Hi Jens,

Thanks for your valuable inputs.The compression was set to High for both.

However, i tried with all three types of compressions : None, Medium and High and reloaded the documents and below were the sizes with Vectorwise and Redshift Database:

Database           None        Medium     High Compression

Vectorwise        12.3MB     10.2MB     3.5MB

Redshift            12.5MB     10.6MB     8.6MB

It looks like the high compression isn't working efficiently for Redshift.

Let me know what can be done for this, also does it impact the RAM footprint when this application is loaded into Server?

Regards,

Diwakar

jfkinspari
Partner - Specialist
Partner - Specialist

I don't know exactly how the High Compression works, but there must be some difference in the data between the systems.

Regarding the RAM footprint in server, this would be unaffected, as the application always is loaded uncompressed to memory.

Anonymous
Not applicable

Quite an interesting observation...

The good news that it is not an actual problem except the storage space.  The only problem is the compression.  I'd ask QlikTech support.