1 Reply Latest reply: Mar 13, 2017 4:08 AM by Marcus Sommer RSS

    Performance issue with query from Data Warehouse

    Heino Juho

      Dear All,


      I am working with millions of rows for the first time and wonder what is the most efficient approach.



      On the same dataset, a colleague created a script utilizing preceding load straight into the front-end qvw with the following idea:


      1. In the FROM-part of the SQL
        1. SQL Select the appropriate fields from two tables (both having millions of records)
        2. Inner Join the tables based on two keys (YearMonth and ID)
      2. In the SELECT -part of the SQL
        1. Calculate difference of 2 fields; Delta-field
        2. TRUNC (round) many fields
        3. Rename fields
      3. Preceding load -qlik script
        1. Create a concatenated key from the YearMonth and ID
        2. Use an if clause to create a Status-field based on the value of Delta-field



      I tried to use an approach with incremental load using separate Qlik-tables for each DW-table: this took a lot longer than my colleague's script.


      1. Create temp tables with normal preceding load for the two tables
        1. Where-clause to limit SQL Select to new data (MonthYear as great or greater than current month)
        2. Create a concatenated key from the YearMonth and ID in the qlik script -part of the preceding load
      2. Store these temp tables into qvd:s
      3. Load temp tables containing new data from qvds and concatenate them with the entire dataset (another load with a wider Where-clause) being loaded from two qvd:s

      *** at this point I have two tables and qvds with the whole dataset ***

         4. Create an inner join between the two tables

         5. Store this table in a qvd

         6. Load this table from qvd and

                     *calculate Delta-field based on the difference of 2 fields

                      *use an if clause to create a Status-field based on the values of Delta-field



      Sorry if this is ambiguously written, what I am basically asking is this:

      1. Is it recommended/faster to
        1. Do joins already in the SQL part?
        2. Calculate differences between fields already in the SQL part?
      2. What would be overall be an efficient strategy in this situation
        1. Two tables with millions of rows, joined via a concatenated key
        2. A difference needs to be calculated on a detailed level (YearMonth and ID) with two fields, each from a table
        • Re: Performance issue with query from Data Warehouse
          Marcus Sommer

          In general an incremental load with qvd's should be faster than sql-loadings against a database.


          But it will be depend on various things how much faster it could be and the efforts which are needed for it.

          For example, is the database quite fast and the limiting factors are mainly the database-driver and/or the network it could be that the inner join + the few transformations (status + truncating) are faster performed as the same rawdata without any transforming are transferred to qlik.


          In this case you couldn't get this step faster with qlik but you could load this joined table with an incremental approach - and I think this is the essential point here - by an incremental loading you need not only load the rawdata incrementally you must also load the transformation-results incremental and by this building a multi-staging load-approach.


          For your case it might be a bit expensive to do this but if load-performance is a bottleneck or you could use thes qvd's in other cases it might be worth.


          - Marcus