Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Is anyone able to spot the issue with my load script? Everything is loading as expected, but the Fact_Sales load from QVD is not optimised. I've read through other posts and they all refer to ensuring a single where clause is included in the load script. One of the loads I'm doing uses about 5 million rows of data and it is quite noticeable.
Note: I'm forcing an audit record of 1 in the incremental data load for testing purposes.
I've tried using a Where Exist, Where Not Exists.I have also tried using Concatenate instead of same table name with no luck.
LET vStore = 'Store1';
LIB CONNECT TO '$(vStore1DB)';
// Grab Max Audit
$(vStore)_MaxAudit:
LOAD MAXAUDIT
FROM [lib://QVD/$(vStore)_MaxAudit.QVD]
(qvd);
Let $(vStore)_Audit = peek('MAXAUDIT',-1,'$(vStore)_MaxAudit');
DROP TABLE $(vStore)_MaxAudit;
//Load incremental data
$(vStore)_FACT_SALES:
LOAD
DATEID&TIMEID&PRODUCTID&STOREID®ISTERID&CLERKSHIFTID&PROMOTIONID&CUSTOMERID&TRANSACTIONNUMBER&SALESEQUENCE as PRIMEID,
DATEID,
TIMEID,
DATEID&'-'&TIMEID AS DATETIMEID,
'$(vStore)'&'-'&PRODUCTID AS PRODUCTID,
'$(vStore)'&'-'&STOREID AS STOREID,
'$(vStore)'&'-'®ISTERID AS REGISTERID,
'$(vStore)'&'-'&CLERKSHIFTID AS TRANDATEID,
'$(vStore)'&'-'&CLERKSHIFTID AS CLERKSHIFTID,
'$(vStore)'&'-'&PROMOTIONID AS PROMOTIONID,
'$(vStore)'&'-'&CUSTOMERID AS CUSTOMERID,
'$(vStore)'&'-'&TRANSACTIONNUMBER AS TRANSACTIONNUMBER,
'$(vStore)'&'-'&SALESEQUENCE AS SALESEQUENCE,
AUDITID,
'$(vStore)'&'-'&EJHEADERID AS EJHEADERID,
'$(vStore)'&'-'&EJPLUOBJECTID AS EJPLUOBJECTID,
ISSECONDPRICESALE,
ISCUSTOMPRICESALE,
ISRETURNED,
QUANTITY;
SQL SELECT
DATEID,
TIMEID,
PRODUCTID,
STOREID,
REGISTERID,
CLERKSHIFTID,
PROMOTIONID,
CUSTOMERID,
TRANSACTIONNUMBER,
SALESEQUENCE,
AUDITID,
EJHEADERID,
EJPLUOBJECTID,
ISSECONDPRICESALE,
ISCUSTOMPRICESALE,
ISRETURNED,
QUANTITY
FROM "FACT_SALES"
WHERE AUDITID > 1//'$($(vStore)_Audit)'
ORDER BY AUDITID;
LET vNoOfRows = NoOfRows('$(vStore)_FACT_SALES');
IF $(vNoOfRows) > 0 THEN
// Update Max AuditID from latest increment.
Let $(vStore)_Audit = peek('AUDITID',-1,'$(vStore)_FACT_SALES');
$(vStore)_MaxAudit:
LOAD * INLINE [
MAXAUDIT
$($(vStore)_Audit)
];
Store * from $(vStore)_MaxAudit into [lib://QVD/$(vStore)_MaxAudit.QVD];
DROP table $(vStore)_MaxAudit;
// Loading STORE_FACT_SALES Data from QVD
$(vStore)_FACT_SALES:
LOAD
PRIMEID,
DATEID,
TIMEID,
DATETIMEID,
PRODUCTID,
STOREID,
REGISTERID,
TRANDATEID,
CLERKSHIFTID,
PROMOTIONID,
CUSTOMERID,
TRANSACTIONNUMBER,
SALESEQUENCE,
AUDITID,
EJHEADERID,
EJPLUOBJECTID,
ISSECONDPRICESALE,
ISCUSTOMPRICESALE,
ISRETURNED,
QUANTITY
FROM [lib://QVD/$(vStore)_FACT_SALES.QVD]
(qvd)
WHERE NOT EXISTS (PRIMEID);
Two things that come to mind when I look at this...
TmpData:
LOAD *
FROM MyData.qvd (QVD);
Data:
NoConcatenate LOAD *
Resident TmpData
WHERE Not Exists (MyField);
DROP Table TmpData;
I can't seem to get my head around that style Toni. If I'm loading in the QVD as temp data, I can't use the where not exists because that corresponding value will already be loaded into memory.
If you refer to your example screenshot, prior to loading the QVD you would want to already have a table with a few values matching the same columns.
Ah, yes that won't work since the Exists will look in the the existing field which already contains all values. My bad.
I am quite sure you can rename field during load form QVD and still get a optimized load. Makes it a bit more complicated with the Exists, but this should work.
TmpData:
LOAD
F1 AS TmpF1,
F2
FROM MyData.qvd (QVD);
Data:
LOAD
TmpF1 AS F1,
F2
Resident TmpData
WHERE Not Exists (F1, TmpF1);
DROP Table TmpData;
Toni,
For the Exists to keep the file optimized the column name must be the same.
Optimized
where Exists(Column)
Not Optimized
where Exists(ColumnName, Column)
What I am saying is that you should always load the QVD optimized and then do your filtering and data manipulation on a resident load. In my example above the Exists is in the resident load, which does not have an optimized mode as such.
Do you mean that my latest suggestion does not speed up your reload, compared to the unoptimized QVD load? Is your QVD generated in Qlik Sense or QlikView?
Sorry Toni, you are correct on the resident.
This is basically what I stated above.
Looks like this is loading the QVD in an optimised fashion now and I am getting expected results when updates to historical data occur.
I'll try this on the large dataset and do a comparison of speed. Do you know the performance difference between resident load and un-optimised QVD load?
Updated Query with Optimised Loads:
//Load incremental data
$(vStore)_FACT_SALES:
LOAD
DATEID&TIMEID&PRODUCTID&STOREID®ISTERID&CLERKSHIFTID&PROMOTIONID&CUSTOMERID&TRANSACTIONNUMBER&SALESEQUENCE as PRIMEID,
...
FROM "FACT_SALES"
WHERE AUDITID > '$($(vStore)_Audit)'
ORDER BY AUDITID;
TEMP:
NOCONCATENATE LOAD
PRIMEID AS PRIMEID1,
...
FROM [lib://QVD/$(vStore)_FACT_SALES.QVD]
(qvd);
CONCATENATE($(vStore)_FACT_SALES)
LOAD
PRIMEID1 AS PRIMEID,
...
RESIDENT TEMP
WHERE NOT EXISTS (PRIMEID, PRIMEID1);
DROP table TEMP;
! !
Store * from $(vStore)_FACT_SALES into [lib://QVD/$(vStore)_FACT_SALES.QVD];
It is hard to predict the difference. There are actually scenarios where a unoptimized QVD load will be quicker, and that depends on the characteristics of the data stored in the QVD. An absolute majority of QVD files are quicker to load optimized and then manipulate over a resident load, and this is why that is the best practice of loading QVD.
In the unoptimized loads the QVD data has to be uncompressed before it can be manipulated. The load will remains optimized when you apply operations that do not require uncompressing the data, for example renaming a field. The in-memory processing of data is naturally more optimized in QlikView and therefore the resident processing is most of the time quicker.
Turns out this is one of those cases. The unoptimized load of the QVD in my first implementation took 00:02:05 to complete, the second implementation using an optimized temp table then load from resident took 00:02:51 to complete.
I believe this time would be sub 1minute in if the load from QVD with a single not exists was able to use the "Optimized" method.