Re: How to deal with Large Monthly Data set 7GB fo... - Qlik Community

Anonymous · ‎2017-12-04

I'm relatively new to QlikView and have come into a new job where an existing implementation of a Qlikview solution is ingesting 7GB's of Monthly data from a SAS platform with 200+ attributes/columns for self-service reporting. The result has been poor Qlikview query performance and negative end users sentiment towards the QlikView product.

Assuming I cant do anything about reducing the number of measures and dimensions that have to be available for self-service reporting solution; how does one best utilise QlikSence to optimise performance? I can't help but feel the problem lies with the manner in which the solution was implemented more than the tool itself.

Can someone please send me some reference material on how a Qlikview self-service solution should be designed and how to navigate around common pitfalls associated with large data sets required for self-service solutions?

Additional Questions:

1. Can one break up the size of the dataset and send it as parts rather than one giant CSV?

2. Can one use ODBC instead of CSV? I'm being told that by the team that delivered the solution that a single CSV was the only way they could load the data and that loading the equivalent dataset via an ODBC was taking 7-8 hours...they've stated that they have used the latest ODBC drivers.

3. Can Qlikviews flat file be broken down to smaller data sets on the Qlikview back-end before being presented to the dashboard that requires only a subset of its data; or does every dashboard in a Self Service solution share a single superset?

4. What additional details would you need to help troubleshoot the cause such a problem? And at what point does one identify the cause of the problem being the infrastructure, not the way the solution was built?

marcus_sommer · ‎2017-12-05

I think you will need to implement an incremental load-approach maybe on multiple csv-slices for example on a daily level or per odbc which would mean a massive reducing of the load-times: Advanced topics for creating a qlik datamodel (last two link-blocks in it).

Further I would check if really all 200+ attributes/columns are necessary within a single application or if it could be logically splitted and of course removing all fields which are not useful for an user like record-id's: Search Recipes | Qlikview Cookbook‌.

The next step will be to look on the number of distinct field-values: The Importance Of Being Distinct.

Of course the datamodel itself will be quite important by larger datasets and should be rather a star-scheme or even a big flat-table. All heavy calculations should be rather implemented within the script so that the ui-expressions could be build with simple sum/count/avg-expressions by avoiding (nested) if-loops, aggr- and interrecord-functions.

I could imagine that in the end from 7 GB of rawdata will remain about 10% within the qvf (without any splitting and/or document-chaining - it's rather the worst-case if the amount of data could really not handled within a single application) and the user-experience regarding to the performance will be quite good.

- Marcus

Anonymous · ‎2017-12-05

Thank you for your response and the respective resources provided; I will give them a read.

How to deal with Large Monthly Data set 7GB for Self Service Reporting