Optimized loading qvd...another question

alexandernatale · ‎2022-10-17

good morning everyone,

for a few weeks I have been trying to understand how I can see the advantages of using or not using an optimized load.

I state that I searched the forum thoroughly and read the shared article over and over again on many similar threads.
(http://www.quickintelligence.co.uk/qlikview-optimised-qvd-loads/)

My current ETL structure is as follows:

- from different data sources I extract the data that interest me (DB Oracle, DB Mysql, internal management software, excel sheets, etc.). The extraction takes place with external connectors and APIs that populate the tables on a mysql DB. (We can say that I am creating my datawarehouse)

- EXTRACTION PHASE: the single generated tables are read by QS and the qvd files are generated (modifications, concatenations, etc.). These first qvds I called them "extracted".

- I generate the final qvds going to read the qvds of the previous step calling them "transformed" and creating the final dimensional model.

When I upload only the "transformed" qvd I find the fateful message "optimized" written.

The problem is that when I relaunch the whole load - therefore from the extraction phase - the process is still slow (for obvious reasons).

So how could the optimization of the loading of the qvd help me? Does it also act at the level of data loading within the app (For example when I select filters)?

Clarify my ideas, which are few and confusing. 🙂

vikasmahajan · ‎2022-10-17

Optimised Load :

It is very common for us to store information from various data sources in intermediate QVD files. A QVD file can contain a table with multiple columns and as many rows as needed. The purpose of a QVD file is both to customize the data source from multiple readings of the same information from several different apps and to be able to apply transformation logic in several steps.

When we then read data from a QVD file into an app, it can be read in two ways - optimized and non-optimized load. Optimized load is considerably faster but puts great demands on how the reading is done. In order for a data load to be optimized, we can only:

Filter data with a single exists() function
Rename columns
Read a subset of the columns
Keep/Join towards another table
Read a field several times
Read data distinct

Thus we can NOT:

Calculate a new column (e.g. ColumnA + 1 as Column3)
Use where statements in addition to a single exists() function
Use ApplyMap()

Few Scenarios :

https://community.qlik.com/t5/QlikView-App-Dev/what-are-optimized-loads-and-unoptimized-loads/td-p/8...

You need to check all this scenario's in script to optimize performance of your script.

Thanks

Vikas

Hope this resolve your issue.
If the issue is solved please mark the answer with Accept as Solution & like it.
If you want to go quickly, go alone. If you want to go far, go together.

marcus_sommer · ‎2022-10-17

It seems that you missed the essential part of applying incremental approaches. It means just to load the current data from the sources and all the historic ones comes from the qvd's without a new processing of these data.

Usually it's done within n extracting/transforming layers and by slicing the data most commonly within periodic YYYYMM slices. The current data could normally not be loaded optimized but the historically ones.

- Marcus

alexandernatale · ‎2022-10-17

thanks @marcus_sommer and @vikasmahajan for the reply!

To better understand: so the optimized load is only functional to the incremental loading of data?

Is it important for my current facility to have an optimized load or not?

To better understand:

1. so the optimized load is only functional to the incremental loading of data?

2. Is it important for my current facility to have an optimized load or not?

3. When I run an app, is opening and querying the data affected by whether the load is optimized or not?

Thanks!

marcus_sommer · ‎2022-10-18

It depends a bit on how incremental loadings are defined if the creation and usage of qvd's are only functional to it or if they have further possibilities. In a narrow interpretation only regarding the refresh with the newest/changed data then you will find various possibilities to use qvd's and in a wider interpretation each intermediate step in an ETL is an incremental approach.

The essential logic behind everything is to divide the work into several smaller steps each one simplified and more specialized to a certain part not only to keep them simple and more readable else also to re-use them for various purposes and be able to run them in different timeframes and/or in parallel and in the end to create more powerful/performant solutions.

Of course, each layer will create some extra efforts to manage the overhead but the more complex the task is and the bigger the amount of data are the more benefits are possible if the work is smartly divided - but of course it needs to be balanced. How much overhead is sensible in this scenario? By smaller tasks you may negle all intermediate things and doing everything (ETL + datamodel + UI) within a single app whereby regarding to your environment you may handle even these simple things in the same way as the bigger ones to standardize the handling.

1. IMO no - because of re-usage possibility of the qvd's between various logic + app

2. By larger datasets - yes. It depends on your resources - especially the timely ones.

3. Within an app it's completely regardless how the data were loaded but not how the data-model is designed.

- Marcus

Data Load Editor

Developers

General Question

SaaS

Script