Handling Big Fact table. - Qlik Community

Report Inappropriate Content · ‎2013-02-04

performance tuning...

I have a big, very big Fact table ~ billion lines, filled with highly granular data.

no dimention tables. just one big fact with ~20 fields.

this data is used in one chart object and, obviously... it chokes.

how would you recommend modifying the ERD, or perfomance tune the object?

it's basically a fact with a key of period-subscriber-promotion containing event data.

it was not meant for QV, but will splitting that giant table in 3 dims and one event based fact help performace?

after all... there will still be ~billion events.

thanks.

swuehl · ‎2013-02-04

It's hard to give any detailed advice without knowing your data model and objects in detail, too, but here is a link to a thread with some general recommendations:

http://community.qlik.com/thread/18473?start=0&tstart=0

Report Inappropriate Content · ‎2013-02-04

yea... I'm talking about 1 BILLION lines... not 30 million...
need something more powerful.

we are going to implement Hadoop soon as we are trasitioning to big data, but I have no Idea what kind of QV data models are equipped with handling this much data.

the suggestions made in your refferal are good practive for small-medium data sets.

on the hardware side we have servers running 24+ cpu's and 256GB+ ram...

hic · ‎2013-02-04

We have run tests with this amount of data and although we have shown that it is theoretically possible, you need to be aware of some things that can invalidate the theory...

Reduce the number of fields: 20 fields is a lot. Can you cut away some of them? Splitting into three dimensional tables and one fact table could potentially reduce the number of fields in the fact table. So, yes, you should try that.
Reduce the number of distinct values in a field: Can you cut down on number of distinct values? E.g. if you have a timestamp, you may be better off splitting it into a date field (low number of discrete values) and a time field where you round to nearest 15 seconds (also fairly low number of discrete values).
Partition data/Reduce the number of records: Do you really need all data in one qvw? Or can you solve the problem having e.g. one file per day?
Aggregate data/Reduce the number of records: Do you really need the granularity?

ALso

Read http://community.qlik.com/blogs/qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers to better understand some of the QlikView internals.

Load 5% of the data and see how much memory QlikView uses and what the response times are. Extrapolate to get a feeling for what you will get/ what you'll need when you load all data.

HIC