Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
We are dealing with 1 billions row of data in csv and facing performance issues due to huge size. Resulting qvw file is of 3 GB and memory utilization of server reaches to 50 GB while ingesting data and loading scripts which takes more than 1 hour to complete.
What best practices we can apply to improve performance
regards
Anubhav
I suggest to store the data within one or in several qvd-files and the use of incremental load-approaches. Within the last two link-blocks here: Advanced topics for creating a qlik datamodel you will find various useful links to incremental loads and optimizing loads with exists().
Further you should consider if you really need all fields from the csv (you won't need fields like table-id's from a database) and if you could split high cardinality fields like timestamps (into a date- and time-field).
- Marcus
First and foremost, do not load data you won't be using anyway. Don't load fields you won't use and don't load records you won't use. Next, load the data once, store it in a qvd data file and from then on use the qvd data file as data source for further processing if necessary or as data source for the final qlikview dashboards.
Hi Anubhav,
Here we need to follow QV Optimization Techniques .
script level:
1)drop temp tables
2)use mapping if possible
3)try to avoid join
ui level:
1)write conditions instead of chart in script level it self
2)observe the memory consumption of charts to populate
...etc
hi @Gysbert_Wassenaar @marcus_sommer , I loaded once my 1 billion lines table, by partitioning the script in years and then concatenating. After this, i want to implement incremental load, but with a hard delete, and the last step is an inner join with the primary key of the initital billion line table.... How to handle this?!?!
Please create a separate topic for this question.