Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Incremental load from QVD is not being optimised with single where clause

Is anyone able to spot the issue with my load script? Everything is loading as expected, but the Fact_Sales load from QVD is not optimised. I've read through other posts and they all refer to ensuring a single where clause is included in the load script. One of the loads I'm doing uses about 5 million rows of data and it is quite noticeable.

Note: I'm forcing an audit record of 1 in the incremental data load for testing purposes.

I've tried using a Where Exist, Where Not Exists.I have also tried using Concatenate instead of same table name with no luck.

LET vStore = 'Store1';

LIB CONNECT TO '$(vStore1DB)';

// Grab Max Audit

$(vStore)_MaxAudit:

LOAD MAXAUDIT

FROM [lib://QVD/$(vStore)_MaxAudit.QVD]

(qvd);

Let $(vStore)_Audit = peek('MAXAUDIT',-1,'$(vStore)_MaxAudit');

DROP TABLE $(vStore)_MaxAudit;

//Load incremental data

$(vStore)_FACT_SALES:

LOAD

DATEID&TIMEID&PRODUCTID&STOREID&REGISTERID&CLERKSHIFTID&PROMOTIONID&CUSTOMERID&TRANSACTIONNUMBER&SALESEQUENCE as PRIMEID,

DATEID,

TIMEID,

DATEID&'-'&TIMEID AS DATETIMEID,

'$(vStore)'&'-'&PRODUCTID AS PRODUCTID,

'$(vStore)'&'-'&STOREID AS STOREID,

'$(vStore)'&'-'&REGISTERID AS REGISTERID,

'$(vStore)'&'-'&CLERKSHIFTID AS TRANDATEID,

'$(vStore)'&'-'&CLERKSHIFTID AS CLERKSHIFTID,

'$(vStore)'&'-'&PROMOTIONID AS PROMOTIONID,

'$(vStore)'&'-'&CUSTOMERID AS CUSTOMERID,

'$(vStore)'&'-'&TRANSACTIONNUMBER AS TRANSACTIONNUMBER,

'$(vStore)'&'-'&SALESEQUENCE AS SALESEQUENCE,

AUDITID,

'$(vStore)'&'-'&EJHEADERID AS EJHEADERID,

'$(vStore)'&'-'&EJPLUOBJECTID AS EJPLUOBJECTID,

ISSECONDPRICESALE,

ISCUSTOMPRICESALE,

ISRETURNED,

QUANTITY;

SQL SELECT

DATEID,

TIMEID,

PRODUCTID,

STOREID,

REGISTERID,

CLERKSHIFTID,

PROMOTIONID,

CUSTOMERID,

TRANSACTIONNUMBER,

SALESEQUENCE,

AUDITID,

EJHEADERID,

EJPLUOBJECTID,

ISSECONDPRICESALE,

ISCUSTOMPRICESALE,

ISRETURNED,

QUANTITY

FROM "FACT_SALES"

WHERE AUDITID > 1//'$($(vStore)_Audit)'

ORDER BY AUDITID;

LET vNoOfRows = NoOfRows('$(vStore)_FACT_SALES');

IF  $(vNoOfRows) > 0 THEN

// Update Max AuditID from latest increment.

Let $(vStore)_Audit = peek('AUDITID',-1,'$(vStore)_FACT_SALES');

$(vStore)_MaxAudit:

LOAD * INLINE [

MAXAUDIT

$($(vStore)_Audit)

];

Store * from $(vStore)_MaxAudit into [lib://QVD/$(vStore)_MaxAudit.QVD];

DROP table $(vStore)_MaxAudit;

// Loading STORE_FACT_SALES Data from QVD

$(vStore)_FACT_SALES:

LOAD

PRIMEID,

DATEID,

TIMEID,

DATETIMEID,

PRODUCTID,

STOREID,

REGISTERID,

TRANDATEID,

CLERKSHIFTID,

PROMOTIONID,

CUSTOMERID,

TRANSACTIONNUMBER,

SALESEQUENCE,

AUDITID,

EJHEADERID,

EJPLUOBJECTID,

ISSECONDPRICESALE,

ISCUSTOMPRICESALE,

ISRETURNED,

QUANTITY

FROM [lib://QVD/$(vStore)_FACT_SALES.QVD]

(qvd)

WHERE NOT EXISTS (PRIMEID);

19 Replies
ToniKautto
Employee
Employee

Two things that come to mind when I look at this...

  1. In most cases you get best performance by doing a optimized QVD load. Then filter the data through a resident load.

    TmpData:

    LOAD *

    FROM MyData.qvd (QVD);

    Data:

    NoConcatenate LOAD *

    Resident TmpData

    WHERE Not Exists (MyField);

    DROP Table TmpData;

  2. QVD format is not the same in Qlik Sense and QlikView. Make sure that the QVD files you use in Qlik Sense has been generated in Qlik Sense, for best load performance.

Not applicable
Author

I can't seem to get my head around that style Toni. If I'm loading in the QVD as temp data, I can't use the where not exists because that corresponding value will already be loaded into memory.

If you refer to your example screenshot, prior to loading the QVD you would want to already have a table with a few values matching the same columns.

ToniKautto
Employee
Employee

Ah, yes that won't work since the Exists will look in the the existing field which already contains all values. My bad.

I am quite sure you can rename field during load form QVD and still get a optimized load. Makes it a bit more complicated with the Exists, but this should work.

TmpData: 

LOAD

     F1 AS TmpF1,

     F2 

FROM MyData.qvd (QVD); 

 

Data: 

LOAD

     TmpF1 AS F1,

     F2

Resident TmpData 

WHERE Not Exists (F1, TmpF1); 

 

DROP Table TmpData; 

Anonymous
Not applicable
Author

Toni,

For the Exists to keep the file optimized the column name must be the same.

Optimized

where Exists(Column)

Not Optimized

where Exists(ColumnName, Column)

ToniKautto
Employee
Employee

What I am saying is that you should always load the QVD optimized and then do your filtering and data manipulation on a resident load. In my example above the Exists is in the resident load, which does not have an optimized mode as such.

Do you mean that my latest suggestion does not speed up your reload, compared to the unoptimized QVD load? Is your QVD generated in Qlik Sense or QlikView?

Anonymous
Not applicable
Author

Sorry Toni, you are correct on the resident. 

This is basically what I stated above.

Not applicable
Author

Looks like this is loading the QVD in an optimised fashion now and I am getting expected results when updates to historical data occur.

I'll try this on the large dataset and do a comparison of speed. Do you know the performance difference between resident load and un-optimised QVD load?

Not applicable
Author

Updated Query with Optimised Loads:

//Load incremental data

$(vStore)_FACT_SALES:

LOAD

DATEID&TIMEID&PRODUCTID&STOREID&REGISTERID&CLERKSHIFTID&PROMOTIONID&CUSTOMERID&TRANSACTIONNUMBER&SALESEQUENCE as PRIMEID,

...

FROM "FACT_SALES"

WHERE AUDITID > '$($(vStore)_Audit)'

ORDER BY AUDITID;

TEMP:

NOCONCATENATE LOAD

  PRIMEID AS PRIMEID1,

...

FROM [lib://QVD/$(vStore)_FACT_SALES.QVD]

(qvd);

CONCATENATE($(vStore)_FACT_SALES)

LOAD

  PRIMEID1 AS PRIMEID,

...

RESIDENT TEMP

WHERE NOT EXISTS (PRIMEID, PRIMEID1);

DROP table TEMP;

! !

Store * from $(vStore)_FACT_SALES into [lib://QVD/$(vStore)_FACT_SALES.QVD];

ToniKautto
Employee
Employee

It is hard to predict the difference. There are actually scenarios where a unoptimized QVD load will be quicker, and that depends on the characteristics of the data stored in the QVD. An absolute majority of QVD files are quicker to load optimized and then manipulate over a resident load, and this is why that is the best practice of loading QVD.

In the unoptimized loads the QVD data has to be uncompressed before it can be manipulated. The load will remains optimized when you apply operations that do not require uncompressing the data, for example renaming a field. The in-memory processing of data is naturally more optimized in QlikView and therefore the resident processing is most of the time quicker.

Not applicable
Author

Turns out this is one of those cases. The unoptimized load of the QVD in my first implementation took 00:02:05 to complete, the second implementation using an optimized temp table then load from resident took 00:02:51 to complete.

I believe this time would be sub 1minute in if the load from QVD with a single not exists was able to use the "Optimized" method.