Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
diwakarnahata
Creator
Creator

File Size increases with Concatenate Fact tables

Hi All,

I have to concatenate two fact tables, Fact1 has 300 million rows and Fact2 has 5 million rows. Both have a few common dimensions and all other columns are different.

The qvw file size before the concatenation was around 3.5 GB, but after the concatenation it has risen to 6 GB.

Does concatenation result in drastic increase in file size if the columns are different?

Also, Please suggest what is the best practice if that is the case.

Regards,

Diwakar

1 Solution

Accepted Solutions
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

This increase is expected. QV Storage is made up of (simplistically) two items:

1. Symbols -- the array of unique values for a field.

2. Records -- an array for each row that points to the field value in the Symbol table for each field in that row.

The Record array is a fixed length and symmetrical, . So even though the concatenated rows may have a null value for some fields, each row will still require the full length of the record array. This is the dark side of concatenating unlike fact tables.

In your example, all 305M rows contain pointers wide enough to represent all Symbol values, even though many of them are null values. You would probably see greatly decreased file size and RAM requirements by keeping them as separate tables.

-Rob

http://masterssummit.com

http://robwunderlich.com

View solution in original post

9 Replies
giakoum
Partner - Master II
Partner - Master II

Does concatenation result in drastic increase in file size if the columns are different?

Not really. Are you sure this is the only change you did?

Concatenating fact tables is good practice generally. Depends on your data.

sujeetsingh
Master III
Master III

Diwakar,

If the data is going to increase for few facts used in expressions and finally calculation time and size for all expressions in your qvw file is going to increase the file size.

giakoum
Partner - Master II
Partner - Master II

are you saying that expressions and calculation time doubled the file size from 3,5GB to 6?

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

This increase is expected. QV Storage is made up of (simplistically) two items:

1. Symbols -- the array of unique values for a field.

2. Records -- an array for each row that points to the field value in the Symbol table for each field in that row.

The Record array is a fixed length and symmetrical, . So even though the concatenated rows may have a null value for some fields, each row will still require the full length of the record array. This is the dark side of concatenating unlike fact tables.

In your example, all 305M rows contain pointers wide enough to represent all Symbol values, even though many of them are null values. You would probably see greatly decreased file size and RAM requirements by keeping them as separate tables.

-Rob

http://masterssummit.com

http://robwunderlich.com

AbhijitBansode
Specialist
Specialist

Below mentioned are common methods to optimize the application:

1. Separate long text columns from the Fact table in separate dimension table

2. Use autonumber function to create join between tables

3. Split timestamp column in to different columns for year, month, day and time. this will reduce the overall distinct values of the column thereby reducing data size.

4. Deal with dimensionless facts using Keep symbol or some other method which best suits your data

Laura_qlik
Luminary
Luminary

Hello, 

 I have a finance app that I have the same problem. What would be the solution?

I tried changing the order, putting null() in the field that is missing in the other tables, what else can I do? If I check the QVDs apart the total should be 3GB, but it is coming up to more than 6GB... 

Thank you.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

@Laura_qlik If the tables have unlike fields, there is no direct solution. The table will be wider and therefore the index/records array will be wider as well. 

-Rob

Laura_qlik
Luminary
Luminary

There are not unlike columns, just more fields...
like:

TABLE 1:

A,

B,

C;

TABLE 2:

A,

B,

C,

D,

E;

There is nothing or a workaround I could do? 😞
Thank you so much.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

When I said unlike columns I meant different fields. You overall size will increase because the TABLE1 rows will now need to have pointers for fields D&E even though those rows will have null values. 

As I said, you can't directly overcome the increase in size. You could do some remodeling like moving common fields (like A, B, C) into a separate table. This would decrease the size if there were lot's of duplicate values in the fields you moved. 

-Rob