Solved: Operations from QMC or Ram - Qlik Community

H_Julian · ‎2019-08-15

Hey Guys,

I'm just trying to make our Qlik Sense Scripting more efficient.

What i was wondering, where it is fast to do operations when loading from a qmc-file or first load the qmc file without operations and do the operations withinn a resident load?

Greetings

treysmithdev · ‎2019-08-15

Do you mean a QVD?

To be honest it depends. If you can leverage QVD optimized loads, then that will always be faster, however that is really limited.

An optimized QVD load happens when you load a QVD with only field aliases and a simple Where Exists clause. A lot of times I will load in a QVD and limit the data set first and then start doing the operations on the smaller data set, though that is if I am only supposed to load the partial data set.

For Example:

let vStartYear = 2016;
Let vEndYear = 2020;

YearExists:
Load
    $(vStartYear) + iterNo() - 1 as Year
AutoGenerate
    (1)
While
    ($(vStartYear) + iterNo() - 1) <= $(vEndYear);


Sales_tmp:
Load
    Year as SalesYear,
    Product as ProductId,
    Customer,
    Quantity as Qty,
    SalesRep,
    Amount
From
    [lib://QVD/Sales.qvd](qvd)
Where
    Exists(Year);

However, a lot of times there is more transformation needed than that. From experience, it seems that doing it against the QVD is usually faster. Though honestly, there isn't a hard and fast rule. If it is at a point where I am tuning the load, I will try both and use whichever is faster. If I am not particularly worried about performance, I try to use whatever makes the code more readable and easy to understand.

I think this type of scenario also depends on the RAM footprint at the time of the operation. If you load from a QVD you are loading that data into RAM once to do the operation. If you do it from memory, you are effectively doubling the RAM footprint, 1 for the original table and 2 for the table being loaded through a Resident load which you are performing your operations on.

Blog: WhereClause Twitter: @treysmithdev

View solution in original post

diego_a_barboza · ‎2019-08-20

In order to keep good performance, there are a few rules that I follow:

1- Work with segmented data: Most of the I prefer to partition the fact tables depending on the mont, so I get: Sales_201907.qvd , Sales_201908.qvd and so. I do it this way so whenever I need to work with data, I only process the last two months, since the previous months were already processed by the ETL, then I can decide if I just want to load two months of data (for dev or testing) or more (for production) and I handle that using a loop with an Optimized load.

2- Dev means "reduced volume": Whenver you are developing or just exploring the data, always work with reduced volumes. If your server can't handle high volumes, only load the fields that you know will use or, otherwise, just load the data for one particular day and one particular section... but always work with reduced volumes (under 10k or so) . If you do it wisely, you will reduce the impact on your server.

3- Split your job: Create an app to work with your sales data and store the cured data in a QVD. Then create another app to work with your sales team and territories data and store in a QVD. Then another for customers data and also store it. In the end, create the last app that will only load the QVDs (optimized) and then create the relationships. This will not only allow you to have a better performance, but also will allow you to have better scalability and reuse QVDs.

4- Keep healthy relationships: If you are working with large volumes of data, always go with INT for the key fields that relate the tables. They are much lighter and you will notice the improve on the performance of the app (on load and usage)

5- Reduced expressions: If you are working with large volumes of data (2M or more) always work with Set analysis over INT fields, avoid count(distinct), aggr(), rangesum(), rangemax(), rangemin() and rangeavg() as much as possible, by making calculations on ETL (load script). And if one of your graphs has an expression that has more than 3 lines, your server will pay for it.

6- Avoid Resident Loads: Resident loads are great... unless you are working with large volumes of data. Also, if your data is highly denormalized, it can be pretty heavy to handle. Remember to only work with the fields that you will actually use (or that are relevant for your current analysis) and avoid processing the same data more than once whenever possible.

7- When id doubt, call the doctor: If you have an app that is killing your server, the best way to diagnose is the Document Analyzer. You can download it from https://qlikviewcookbook.com/tools/ and it will tell you what you can do to improve an app... and then, ask here about any questions you may have.

If you want to, you my contact me and I will be happy to help.

View solution in original post

treysmithdev · ‎2019-08-15

Do you mean a QVD?

To be honest it depends. If you can leverage QVD optimized loads, then that will always be faster, however that is really limited.

An optimized QVD load happens when you load a QVD with only field aliases and a simple Where Exists clause. A lot of times I will load in a QVD and limit the data set first and then start doing the operations on the smaller data set, though that is if I am only supposed to load the partial data set.

For Example:

let vStartYear = 2016;
Let vEndYear = 2020;

YearExists:
Load
    $(vStartYear) + iterNo() - 1 as Year
AutoGenerate
    (1)
While
    ($(vStartYear) + iterNo() - 1) <= $(vEndYear);


Sales_tmp:
Load
    Year as SalesYear,
    Product as ProductId,
    Customer,
    Quantity as Qty,
    SalesRep,
    Amount
From
    [lib://QVD/Sales.qvd](qvd)
Where
    Exists(Year);

However, a lot of times there is more transformation needed than that. From experience, it seems that doing it against the QVD is usually faster. Though honestly, there isn't a hard and fast rule. If it is at a point where I am tuning the load, I will try both and use whichever is faster. If I am not particularly worried about performance, I try to use whatever makes the code more readable and easy to understand.

I think this type of scenario also depends on the RAM footprint at the time of the operation. If you load from a QVD you are loading that data into RAM once to do the operation. If you do it from memory, you are effectively doubling the RAM footprint, 1 for the original table and 2 for the table being loaded through a Resident load which you are performing your operations on.

Blog: WhereClause Twitter: @treysmithdev

diego_a_barboza · ‎2019-08-20

In order to keep good performance, there are a few rules that I follow:

1- Work with segmented data: Most of the I prefer to partition the fact tables depending on the mont, so I get: Sales_201907.qvd , Sales_201908.qvd and so. I do it this way so whenever I need to work with data, I only process the last two months, since the previous months were already processed by the ETL, then I can decide if I just want to load two months of data (for dev or testing) or more (for production) and I handle that using a loop with an Optimized load.

2- Dev means "reduced volume": Whenver you are developing or just exploring the data, always work with reduced volumes. If your server can't handle high volumes, only load the fields that you know will use or, otherwise, just load the data for one particular day and one particular section... but always work with reduced volumes (under 10k or so) . If you do it wisely, you will reduce the impact on your server.

3- Split your job: Create an app to work with your sales data and store the cured data in a QVD. Then create another app to work with your sales team and territories data and store in a QVD. Then another for customers data and also store it. In the end, create the last app that will only load the QVDs (optimized) and then create the relationships. This will not only allow you to have a better performance, but also will allow you to have better scalability and reuse QVDs.

4- Keep healthy relationships: If you are working with large volumes of data, always go with INT for the key fields that relate the tables. They are much lighter and you will notice the improve on the performance of the app (on load and usage)

5- Reduced expressions: If you are working with large volumes of data (2M or more) always work with Set analysis over INT fields, avoid count(distinct), aggr(), rangesum(), rangemax(), rangemin() and rangeavg() as much as possible, by making calculations on ETL (load script). And if one of your graphs has an expression that has more than 3 lines, your server will pay for it.

6- Avoid Resident Loads: Resident loads are great... unless you are working with large volumes of data. Also, if your data is highly denormalized, it can be pretty heavy to handle. Remember to only work with the fields that you will actually use (or that are relevant for your current analysis) and avoid processing the same data more than once whenever possible.

7- When id doubt, call the doctor: If you have an app that is killing your server, the best way to diagnose is the Document Analyzer. You can download it from https://qlikviewcookbook.com/tools/ and it will tell you what you can do to improve an app... and then, ask here about any questions you may have.

If you want to, you my contact me and I will be happy to help.

H_Julian · ‎2019-08-20

Hey Diego,

Many thanks for your detailed reply. This really helps me a lot!!!

I'm always eager to improve and this will most definitely bring me and the company I'm working for a whole lot forward.

Best Regards

Julian

Operations from QMC or Ram

efficiency

Script