Skip to main content
Announcements
See why Qlik is a Leader in the 2024 Gartner® Magic Quadrant™ for Analytics & BI Platforms. Download Now
cancel
Showing results for 
Search instead for 
Did you mean: 
fishing_weights
Contributor III
Contributor III

A good indicator to tell if Qlik is using lesser memory to run load script

Hello

My Data Model for Qlikview failed in my QMC server because of memory load throwing the error 128 / 129.

As I try to optimise the script, what should I be looking at to know that I'm using less memory? Is it the lines fetched or the Elapsed time? 

 

2 Solutions

Accepted Solutions
marcus_sommer

The load-speed isn't directly related to the RAM consumption else depending on the available storage/network performance and the applied processing-steps respectively the available resources of the CPU. But faster running loads make you more flexible to combine ETL tasks and/or to distribute them on more/other applications and/or time-frames.

I never really noticed that a join were faster as a mapping else my experiences are quite opposite and there are only very rare occasions in which I use joins (very simple ones for denormalizing dimensions or creating intentionally cartesian results). Or in other words I use nearly always mappings - even dozens of them in one load and multi-times vertically/horizontally nested - I could never do it with joins. Beside the performance mapping has further benefits because they have risks to change the number of records and they provide a default-parameter for a non-matching and not just NULL.

It's very common to use incremental approaches by loading raw-data because often neither the (usual) sql-databases could return millions of records in an appropriate time nor the network transfer them. So just the new and/or changed records are fetched. But all following layers could be also loaded incremental - by rather simple measurements like filtering, matching, formatting and so on it's nearly so easy like by the raw-data - but also more complex ETL with aggregations, populating data, accumulations and so on it's possible (of course with some efforts) and the effect might be even bigger as by the raw-data. Personally I slice the most raw- and intermediate data-stages as YYYYMM chunks.

View solution in original post

marcus_sommer

Yes, it is a disadvantages that there is no way to access a mapping table unless the mapping-functions which also means they couldn't be actively dropped.

But within the most scenarios this isn't a big issue because most of the mapping-tables are rather small with maybe a few hundreds/thousands records and even if there are a hundreds of them it must be a show-stopper. And if they are really larger - like in my above hinted nested usage with many millions of huge (dozens of combined fields) strings - it's the right time to implement an incremental logic and/or to slice the task into smaller chunks.

Joins are IMO no sensible alternative - at least not for large data-sets and complex ETL requirements which should be always vertically/horizontally distributed to n applications/tasks - not only for performance reasons else also for administrative purposes. 

View solution in original post

7 Replies
marcus_sommer

Datamodel means what - a certain generator-layer which just produces n (different) qvd's or an application which which loads n qvd's as basis for a report or a binary loaded application? What happens there in detail?

fishing_weights
Contributor III
Contributor III
Author

yes as in a QVW layer to manipulate the data before binary loading to the actual QVW.

the structure would be:

First QVW : QVW Extract from Source -> store as QVD 

Second QVW:  "DataModel" ->store as QVD 

Third QVW: "Data Model" This combines a few  "Second layer" QVDs 

Fourth QVW: binary load the 3rd QVW to display as a Dashboard.

The error is happening both at the  2nd and 3rd level where I'm manipulating the tables to bring  it all together.

marcus_sommer

There are various approaches in several areas thinkable. At first would be to look if really all fields and records are needed and/or if the tasks could be distributed to a 5th layer and/or some parallel second/third layer application - and all of them are used with (more) incremental approaches - even many of the merging/aggregating tasks could be  performed incremental.

Within the loads you should keep an eye on resident-loads which may not mandatory necessary and might be replaced by qvd-loads or preceeding-loads. Further dropping tables/fields as early as possible. Also you may skip aggregations which have a rather low reduction-rate and replacing joins with mappings.

Further the (final) data shouldn't contain record-id's from the source or created in QlikView - they are only helpful by developing an app or troubleshooting data-issues but not really within the reports. Another look may go to the cardinality of the fields - respectively the number of distinct fiel-values which should be a s low as possible and means something like splitting a timestamp into dates and times. Also avoiding any row-level formatting saved memory.

And the final data-model should be a star-scheme or maybe even a flat-table but not a sql-scheme or a link-table model. 

fishing_weights
Contributor III
Contributor III
Author

Thanks @marcus_sommer  lots to think about. 

1. But how would i know if i am truly cutting the memory load required?

Example if my script runs faster does that mean it is taking lesser memory?

or is it possible to run slower but take up less memory.

Another example I realize there are times mapping load causes my script to run slower then if I do a left join directly.

Does that mean that mapping load is actually taking up more memory space?

2.  "... even many of the merging/aggregating tasks could be  performed incremental."

Do you mean as I manipulate the data I continue to store multiple small qvds along the script and reload them incrementally?

 

marcus_sommer

The load-speed isn't directly related to the RAM consumption else depending on the available storage/network performance and the applied processing-steps respectively the available resources of the CPU. But faster running loads make you more flexible to combine ETL tasks and/or to distribute them on more/other applications and/or time-frames.

I never really noticed that a join were faster as a mapping else my experiences are quite opposite and there are only very rare occasions in which I use joins (very simple ones for denormalizing dimensions or creating intentionally cartesian results). Or in other words I use nearly always mappings - even dozens of them in one load and multi-times vertically/horizontally nested - I could never do it with joins. Beside the performance mapping has further benefits because they have risks to change the number of records and they provide a default-parameter for a non-matching and not just NULL.

It's very common to use incremental approaches by loading raw-data because often neither the (usual) sql-databases could return millions of records in an appropriate time nor the network transfer them. So just the new and/or changed records are fetched. But all following layers could be also loaded incremental - by rather simple measurements like filtering, matching, formatting and so on it's nearly so easy like by the raw-data - but also more complex ETL with aggregations, populating data, accumulations and so on it's possible (of course with some efforts) and the effect might be even bigger as by the raw-data. Personally I slice the most raw- and intermediate data-stages as YYYYMM chunks.

fishing_weights
Contributor III
Contributor III
Author

Thanks @marcus_sommer  this clears up a lot.

I've been testing using Mapping Load vs Join and I believe I found the reason why mapping load slow things down.  There is no way to drop them once done and I end up with a huge amount of mapping tables that are eating into the memory.

Reading past blogs and posts there seems to be no good solution. Would you say if this is the case we should use joins instead?

marcus_sommer

Yes, it is a disadvantages that there is no way to access a mapping table unless the mapping-functions which also means they couldn't be actively dropped.

But within the most scenarios this isn't a big issue because most of the mapping-tables are rather small with maybe a few hundreds/thousands records and even if there are a hundreds of them it must be a show-stopper. And if they are really larger - like in my above hinted nested usage with many millions of huge (dozens of combined fields) strings - it's the right time to implement an incremental logic and/or to slice the task into smaller chunks.

Joins are IMO no sensible alternative - at least not for large data-sets and complex ETL requirements which should be always vertically/horizontally distributed to n applications/tasks - not only for performance reasons else also for administrative purposes.