Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
klekovkinda
Partner - Contributor III
Partner - Contributor III

Qlik App memory consumption optimisaton

Hi all, sometimes qlik is a blackbox and I can't find the best approach for the memory optimisation. 
Hope someone has the answer and can help me with the information I need. So let me explain the case:

I have the qlik application which is reading from the s3 bucket the csv files. In total there is ~ 20 GB of data. And that's the question, what is better from memory consumption point of view: load 20 files by 1 GB  (set DefaultStreamingBufferSizeMB to 1Gb) or 80 files by 250 Mb (set DefaultStreamingBufferSizeMB to 250Gb)?

How is qlik managing the memory? Let's imagine I have 10 tasks at the same time and they are loading the files from the S3 bucket will qlik allocate DefaultStreamingBufferSizeMB for each file or for each application?
Or it's not allocating memory at all and this is just a restriction of loading the file on the file metadata?

Labels (1)
5 Replies
sbaro_bd
Creator
Creator

Hi @klekovkinda ,

That's a lot of questions. I don't knowledge about the mechanism behind Qlik engine, but I know some best practices which can be helpful to optimize an application performance : 

  1. Use a Reloader application to create qvd files : as described by Qlik, that the best optmized format for Qlik application. You will use these QVD as data for your working/reporting application
  2. If the 20 GB data contains historic data, I recommand you to use Incremental load method on your Reloader application. Consult Qlik help to see all ways to do that. 
  3. If necessary, create your model in a intermediate application between your reloader and your reporting application. Use this model as input with the Binary Load statement in your reporting application for optimization.
  4. ODAG : take a look on this Qlik add-on : ODAG 

Regards.

 

klekovkinda
Partner - Contributor III
Partner - Contributor III
Author

Thank you @sbaro_bd All those practice in place. 
But the question is about the consuming big files. How does the DefaultStreamingBufferSizeMB influence on memory allocation? If I have 20 apps reloading at the same and they are loading csv files, will qlik occupy 20 GB for them during the reload even though the csv file are 10 MB each, but DefaultStreamingBufferSizeMB is set to 1GB? 

marcus_sommer

IMO this setting isn't relevant for your described use-case. Like all the other settings there is very seldom a need to adjust any defaults because they are quite well balanced between performance and stability.

In general - everything what's executed in parallel will be added to the overall resource-consumption and increasing the requirements for the max. available resources. And this relates not only to the RAM else also to the network/storage performance and the CPU capacities.

Therefore it's sensible to balance the needs for parallel and serial tasks which is mainly based on a multi-tier data-architecture with qvd-layers and incremental approaches like already hinted from @sbaro_bd.

Beside this used Qlik no row-level data-structures like SQL databases else a column-level storage with distinct field-values in system-tables for each field and n data-tables with bit-stuffed pointer to the system-tables. Quite often reduced this logic the RAM and storage needs very significantly and your 20 GB raw-data may end in 2 GB qvd-data. The more redundant the data are and the larger the data-set the bigger will be be the saving-rate.

Loading the biggest data-parts optimized from qvd's and/or binary is very fast and even by several GB it takes just a few seconds by using Gbit networks or ssd's.

klekovkinda
Partner - Contributor III
Partner - Contributor III
Author


@marcus_sommer wrote:

IMO this setting isn't relevant for your described use-case. Like all the other settings there is very seldom a need to adjust any defaults because they are quite well balanced between performance and stability.

Without adjusting this setting, qlik can only handle CSV files up to a few megabytes. In my scenario, I need to increase it to process larger CSV files. The producer generates these files, and since qlik is the only way to create qvd files, the initial load must use CSV. I'm trying to determine the optimal way to handle 10 tasks triggered simultaneously that need to read CSV files. Should I break them down by 200 MB or 1 GB? How does qlik manage memory in this situation? For instance, if I have an application that needs to consume 20 GB from CSV files, and I set DefaultStreamingBufferSizeMB to 1 GB, how will qlik allocate memory if I have 10 applications loading 1-2 KB files concurrently? Will it allocate memory based on the exact file size or the DefaultStreamingBufferSizeMB setting?

marcus_sommer

I suggest that you skip all these considerations without touching any setting and just loads these csv-files. I have a lot of them and many have several hundreds of MB and the biggest ones are over 2 GB. There are no problems to load them in a row per QlikView desktop client or server and without applying much transformation the load and store as qvd didn't takes very long. Just give it a try.

By 20 GB of raw-data it may need one or two hours. Performed per desktop client it's no big deal and there is IMO no need for multiple simultaneously tasks which will cause more efforts and complexity as necessary.

Beside of this if you have any influence to the file-size of the csv I would recommend a sizing of about 100 MB to be able to open the files per Excel and/or Notepad++. Reasons for it are the need to manipulate the raw-data directly and also to show the business that certain errors are already in the source and not caused from the ETL - and it happens more often than you would like it.

Furthermore nearly everything is sliced as YYYYMM and sometimes with more attributes like channels or categories - raw-data and intermediate/final qvd's whereby all relevant information is included in the file-name. With it it's quite simple to use dirlist/filelist loops and querying the file-content without reading it and picking the wanted data-sets.