Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello all,
I am working on a project where I am trying to switch a bunch of Qliksense QVDs from full nightly reloads to change data capture reloads (CDC), where only new data (identified by a Primary Key) from our base table (in databricks) will be inserted/updated into a QVD where all the data is stored.
The current QVD stored via QlikSense feeds off of a base DLT (delta live table) table from databricks. However, we now want to implement CDC and not do a FULL reload every night to save money on clusters costs.
Only new data identified by a BOL_UPD_DT field from said base DLT in databricks will be loaded (added/concatenated) to the existent QVD. The BOL_UPD_DT field is a timestamp that allows me to identify if an existing row has been newly updated or newly inserted).
My goal is to use the BOL_UPD_DT and Primary Key (PK) from said base DLT table to scope out new data that will make it to the QVD. Both base table and QVD have the same named fields. My issue is - How do I carry out the following steps (via data load editor):
1. Select new data (CDC) from base table based on "LastExecTime" and "BeginningThisExecTime" of the QVD reload.
2. Load the data from the existing QVD.
3. Join the CDC data from the base table (in step 1) with the loaded data from QVD (in step 2) using their Primary Key (PK) and insert new data into the QVD if the data is new or update existing data in the QVD
How does one script this??????
Qlik-sense does a lot of things intuitively and seemingly doesn't give one refined process control per CDCs; so far for the 3 steps laid out above I the following pseudo script-wise:
===============
Let LastExecTime = ConvertToLocalTime(ReloadTime(), 'Place/TimeZone');
Let BeginningThisExecTime = ConvertToLocalTime(now(), 'Place/TimeZone');
Let LastExecTime = reloadtime();
// Loads data from QVD
LOAD
PK,
BOL_UPD_DT,
c,
d,
e
//Isolate Primary Key for join use using unqualify function
qualify *;
unqualify PK;
// Select newly updated or inserted data from base table
SELECT
PK,
BOL_UPD_DT,
c,
d,
e
FROM base DLT
WHERE BOL_UPD_DT >= #$(LastExecTime)#
AND BOL_UPD_DT < #$(BeginningThisExecTime)#;
=====================
//Next for new QVD inserts I tried:
Concatenate LOAD * FROM QVD
WHERE NOT EXISTS(PK);
// did not work. QVD data load editor did not even save/compile because mistake in the script.
I need some help here....
I recommend that you don't use ReloadTime(). It is better to first see what the last (max) record in the qvd is, and use that as base for next data fetch. At least I would do something like that.
You are using qualify * and that will make column names be prefixed with the table name, so when you later try to concatenate tables you will have a mess.
A simpler approach for you is probably to first load data from the databse, after that concatenate the data from qvd. Don't load your qvd data until you have fetched new data. (But you could load what you need from the qvd.
But it's hard to say what your exact problem is. You say that this is pseudoscript, but you also say it doesn't work when you run it. I'd like to see more of the actual script.
At the end you say that the script did not save because of mistake in the script, but the script is always saved no matter what errors you do. (But never compiled, ever.)
Are you using Qlik Sense? SaaS or on-prem? Or some other qlik product?
check this out for incremental loads, all 4 types:
Good afternoon and thank you for responding.
I simplified the use case. I am simply trying to use incremental loads to insert new data from Databricks tables to Qlik(sense) QVDs.
I have a sample table I created in Databricks called "qlik_cdc".
It consists of the following fields:
CAR_ID,
CAR_MODEL,
CAR_YEAR,
COLOR,
CDC_CREATE_DT,
CDC_UPDATE_DT
where CDC_UPDATE_DT is a date field that will be used to track new records.
I already loaded the initial data (just 5 sample records) from the "qlik-cdc" table in databricks into a QVD also called "qlik-cdc" QVD.
I now want to be able to: 1) load new data incrementally without having to reload the whole QVD. 2) Update existing data in the QVD if existing data in the "qlik-cdc" table in databricks has been updated. Basically "Case 3: Insert and Update (No Delete)" from this page: https://help.qlik.com/en-US/qlikview/May2024/Subsystems/Client/Content/QV_QlikView/QVD_Incremental.h...
So far here is what I have for incremental INSERTS only:
I've combed through that page for 2 months, Seems easy (Case 3: Insert and Update (No Delete)) to implement but there is a lot of pertinent info missing on that page.
What you do is basically: fetch new data (based on the timestamp you have), combine data with previously fetched data in existing qvd, store the qvd under the same name (it will overwrite the old qvd).
You can do it the other way around as well: load existing data from qvd, fetch new data and concatenate it into the table with old data, store the qvd with all data.
After testing, I agree with this statement: "I recommend that you don't use ReloadTime(). It is better to first see what the last (max) record in the qvd is, and use that as base for next data fetch. At least I would do something like that."
How do I grab the Max(CDC_UPDATE_DT) and set it to a variable i can use in the insert query?
LET lastReloadTime = Date(Max(CDC_UPDATE_DT); -- is not working
full code as of now:
====
LET lastReloadTime = Date(Max(CDC_UPDATE_DT);
This is where I am now:
=====
This finally worked without erroring out, just need to make sure it did what I wanted:
====================
Okay - i'm moving pretty quickly here. The incremental inserts work now,
I just need to figure out updates, how do I do updates for already existing records in the QVD if its source (in Databricks) has newly updated records (not new inserts, but new updates to existing).