Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello all,
I am working on a project where I am trying to switch a bunch of Qliksense QVDs from full nightly reloads to change data capture reloads (CDC), where only new data (identified by a Primary Key) from our base table (in databricks) will be inserted/updated into a QVD where all the data is stored.
The current QVD stored via QlikSense feeds off of a base DLT (delta live table) table from databricks. However, we now want to implement CDC and not do a FULL reload every night to save money on clusters costs.
Only new data identified by a BOL_UPD_DT field from said base DLT in databricks will be loaded (added/concatenated) to the existent QVD. The BOL_UPD_DT field is a timestamp that allows me to identify if an existing row has been newly updated or newly inserted).
My goal is to use the BOL_UPD_DT and Primary Key (PK) from said base DLT table to scope out new data that will make it to the QVD. Both base table and QVD have the same named fields. My issue is - How do I carry out the following steps (via data load editor):
1. Select new data (CDC) from base table based on "LastExecTime" and "BeginningThisExecTime" of the QVD reload.
2. Load the data from the existing QVD.
3. Join the CDC data from the base table (in step 1) with the loaded data from QVD (in step 2) using their Primary Key (PK) and insert new data into the QVD if the data is new or update existing data in the QVD
How does one script this??????
Qlik-sense does a lot of things intuitively and seemingly doesn't give one refined process control per CDCs; so far for the 3 steps laid out above I the following pseudo script-wise:
===============
Let LastExecTime = ConvertToLocalTime(ReloadTime(), 'Place/TimeZone');
Let BeginningThisExecTime = ConvertToLocalTime(now(), 'Place/TimeZone');
Let LastExecTime = reloadtime();
// Loads data from QVD
LOAD
PK,
BOL_UPD_DT,
c,
d,
e
//Isolate Primary Key for join use using unqualify function
qualify *;
unqualify PK;
// Select newly updated or inserted data from base table
SELECT
PK,
BOL_UPD_DT,
c,
d,
e
FROM base DLT
WHERE BOL_UPD_DT >= #$(LastExecTime)#
AND BOL_UPD_DT < #$(BeginningThisExecTime)#;
=====================
//Next for new QVD inserts I tried:
Concatenate LOAD * FROM QVD
WHERE NOT EXISTS(PK);
// did not work. QVD data load editor did not even save/compile because mistake in the script.
I need some help here....
I recommend that you don't use ReloadTime(). It is better to first see what the last (max) record in the qvd is, and use that as base for next data fetch. At least I would do something like that.
You are using qualify * and that will make column names be prefixed with the table name, so when you later try to concatenate tables you will have a mess.
A simpler approach for you is probably to first load data from the databse, after that concatenate the data from qvd. Don't load your qvd data until you have fetched new data. (But you could load what you need from the qvd.
But it's hard to say what your exact problem is. You say that this is pseudoscript, but you also say it doesn't work when you run it. I'd like to see more of the actual script.
At the end you say that the script did not save because of mistake in the script, but the script is always saved no matter what errors you do. (But never compiled, ever.)
Are you using Qlik Sense? SaaS or on-prem? Or some other qlik product?
check this out for incremental loads, all 4 types: