Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello all,
I am working on a project where I am trying to switch a bunch of Qliksense QVDs from full nightly reloads to change data capture reloads (CDC), where only new data (identified by a Primary Key) from our base table (in databricks) will be inserted/updated into a QVD where all the data is stored.
The current QVD stored via QlikSense feeds off of a base DLT (delta live table) table from databricks. However, we now want to implement CDC and not do a FULL reload every night to save money on clusters costs.
Only new data identified by a BOL_UPD_DT field from said base DLT in databricks will be loaded (added/concatenated) to the existent QVD. The BOL_UPD_DT field is a timestamp that allows me to identify if an existing row has been newly updated or newly inserted).
My goal is to use the BOL_UPD_DT and Primary Key (PK) from said base DLT table to scope out new data that will make it to the QVD. Both base table and QVD have the same named fields. My issue is - How do I carry out the following steps (via data load editor):
1. Select new data (CDC) from base table based on "LastExecTime" and "BeginningThisExecTime" of the QVD reload.
2. Load the data from the existing QVD.
3. Join the CDC data from the base table (in step 1) with the loaded data from QVD (in step 2) using their Primary Key (PK) and insert new data into the QVD if the data is new or update existing data in the QVD
How does one script this??????
Qlik-sense does a lot of things intuitively and seemingly doesn't give one refined process control per CDCs; so far for the 3 steps laid out above I the following pseudo script-wise:
===============
Let LastExecTime = ConvertToLocalTime(ReloadTime(), 'Place/TimeZone');
Let BeginningThisExecTime = ConvertToLocalTime(now(), 'Place/TimeZone');
Let LastExecTime = reloadtime();
// Loads data from QVD
LOAD
PK,
BOL_UPD_DT,
c,
d,
e
//Isolate Primary Key for join use using unqualify function
qualify *;
unqualify PK;
// Select newly updated or inserted data from base table
SELECT
PK,
BOL_UPD_DT,
c,
d,
e
FROM base DLT
WHERE BOL_UPD_DT >= #$(LastExecTime)#
AND BOL_UPD_DT < #$(BeginningThisExecTime)#;
=====================
//Next for new QVD inserts I tried:
Concatenate LOAD * FROM QVD
WHERE NOT EXISTS(PK);
// did not work. QVD data load editor did not even save/compile because mistake in the script.
I need some help here....
So I have figured it all out 🙂 Thanks for your initial reply:
Final Code:
Great! 🙂