Skip to main content
Announcements
Jan 15, Trends 2025! Get expert guidance to thrive post-AI with After AI: REGISTER NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
pschmidt1973
Partner - Contributor II
Partner - Contributor II

What is a good practice - Load from Extract QVD to Transform QVD

Hi All

I am currently reviewing our transform load process and would like some guidance on what is considered good practice.

I have an extract QVD layer that loads changed records from a source SQL DWH based on a last modified date and Where Not Exists.

I now need to understand the best approach to loading from extract QVD -> Apply transformations -> Save QVD (transformed)

I have tested the following scenarios with 4.1 million records and 35 fields:

1. Loaded the extract QVD optimized into a temp table, applied transformations and saved to a new QVD. This took 12 seconds to load the extract QVD into a temp table but took 26 seconds to transform and save to a new QVD.

  //Load extract QVD using optimized load

  temp$(vQVDTargetFileName):

  Load * From [$(vL.QVDExtractPath)/$(vTenantCode)/$(vQVDSourceFileName).qvd] (qvd);   

  $(vQVDTargetFileName):

    Load

    $(vColumns)

    From [$(vL.QVDExtractPath)/$(vTenantCode)/$(vQVDSourceFileName).qvd] (qvd);

    Resident temp$(vQVDTargetFileName);

2. Loaded the extract QVD and applied the transformations directly (not an optimized load) and saved to a new QVD. This took 22 seconds to save to a new QVD.

  //Load extract QVD and transform directly (not an optimized load)

  $(vQVDTargetFileName):

    Load

    $(vColumns)

    From [$(vL.QVDExtractPath)/$(vTenantCode)/$(vQVDSourceFileName).qvd] (qvd);

The non-optimized load, in this scenario, was the better option.

I am looking to improve the load -> transform -> save time to around 15 seconds, does anyone have any recommended approaches?

Best, Paul

6 Replies
cesaraccardi
Specialist
Specialist

Hi Paul,

From my experience resident loadings tend to be slower than loading directly from qvd files, specially when you start working with higher volumes of data.

Cesar

marcus_sommer

I'm not surprised that your second approach is the faster one and I would prefer them, too.

A further optimization could be:

- to reduce the number of fields (maybe there are Month and Year beside a Date)

- splitting of fields (for example a Timestamp into a Date and Time)

- removing of formatings (no Date or Time - only pure numeric values)

- reducing fields with rounding-functions (for example curreny-values to two decimals)

- splitting the qvd into two or several qvd's (this creates some overhead but might have advantages if the table contains facts and dimensions - whereby the dimensions didn't needed to updated with the same frequency or you are able to widen your update-window or to parallelize these tasks)

A furthermore optimization-method would be to load the transforming also with an incremental approach.

So I think you have a lot of potential to improve the load-performance.

- Marcus

pschmidt1973
Partner - Contributor II
Partner - Contributor II
Author

Hi Marcus

Thanks for the suggestions, I've been able to reduce the load time to 15 seconds by removing the following:

  

Date(OpenDateTime) AS Check Opened
Date(CloseDateTime)AS Receipt Date
Floor(Date(CloseDateTime))AS Receipt Date No

My thoughts are to push the conversation to SQL DWH rather than in Qlik? What do you think?

Best, Paul

marcus_sommer

I assume with removing these transformations there are no further transformations left whereby these qvd-load then will be load in an optimized-mode. In general it's a way to transfer transformings within the DWH but it might to transfer the bottleneck from one system to another and also you won't be able to transfer all transformations on the database-side.

By dates and times it depends on the formattings of the fields and the qlikview-settings if they are interpreted as dates/times or not - but I think it's a try worth. The format from the database should be exactly the same like declared within your variable: SET DateFormat='YYYY-MM-DD'; at the beginning of yor script.

Alternatives to such an approach could be to transform the dates/times later in a datamodel- or report-layer or to format these fields directly within the application within document properties in tab numbers.

- Marcus

Lee_Matthews
Former Employee
Former Employee

Hi Paul

One thing to try might be to ensure you are running your NOT EXISTS() query against a minimal set of records. It is possible that if you are concatenating data into the one table, the NOT EXISTS gets run over an ever increasing set of data. Perhaps put the 'delta' list of unique IDs into a second table before you do the load from QVD. Then check the incoming IDs against this static small list. Note that you will need to give the ID column in the smaller table you create a different name, or Qlik will still consider it the one column in it's internal model and lookup table.

Why are you looking to reduce things to less than 15 seconds? If this is about creating a near real time data load, then be aware that if the engine is constantly reloading it will not leave much processing power to handle user selections. One possible alternative might be to use direct discovery for your dynamic content.

pschmidt1973
Partner - Contributor II
Partner - Contributor II
Author

Hi Lee

Thanks for the advice. Yes we are looking into near real time loads and I'll look into direct discovery.

Cheers, Paul