Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all, I am drafting an ETL structure for one of our apps and faced one unsolvable question. Maybe you guys, can share your knowledge and help me with it.
We have one pretty big app with data from way back 2010. Every night I would like to do transformation just for freshest data (the one, that was modified during last 24 hours), but I don't know what to do when - historical data of one of dimmensional tables has changed. Should I reload all fact table or is there an easier way?
For example:
1. During Extract stage I add freshest data about sales from last 24 hours to main Raw_QVD.
2. During Transform stage I modify the freshest data from last 24 hours a by applying applymaps and various joins (let's say I join info about sales person). Then add this data to historical Transformed_QVD
3. During Load stage I load this data to final .qvw application.
What should I do if data about one sales person has changed (changed his name)? Should I reload all historical Transformed_QVD data? Or is there any other way?
You're looking for an Incremental load.
Basic Incremental Load - Qlik Sense - YouTube
+
+
https://www.quickintelligence.co.uk/qlikview-incremental-load/
Remember DUI - Delete , Update, Insert -
You have to follow update and insert methodology.
Thanks Vishnu, as I understand this is for Extration step and I am using this technology. The problem I am facing is , what to do when on Transformation step you got one of your dimmension tables with changed info.
Let me illustrate this problem again with this example:
1. Extract step. To your historical raw data you append only new rows of data or the ones that got it's info changed. (For this step - Incremental load)
2. Transform step. Instead of transforming all raw data, It's smart to only transform data that has been added in during previous step. So you only transform that one and add it to your historical transformed data.
Question is arising in this step. What to do when one of the dimensional tables, which is used in transformation (LEFT JOIN) has changed, let's say error was fixed, and one of the customer's name got changed. This change have to impact not only new data, but also old one. How to manage it? Do we have to apply transformation for all raw data?
As Vishnu Told earlier, Incremental Load is the Solution to this, The approach wil be as follows.
You have to Load all the data since 2010 for one time in this approach, Another thing you have to keep in mind is that, any changes done to the historical Data will not be updated in the Qlik as it will take only the last day data, to over come this issue you can create another condition in the Incremental reload where you can put it like this (Lets assume the QVD name is Sales.QVD) check for the sales.qvd is present or not, If qvd is present then load last 24hours data and concatenate with the QVD data and overwrite the old QVD else if the Qvd is missing then load all the data.
Hi Amith,
Let's say I am transforming this Sales QVD data by applying a lot of Joins, ApplyMaps and difficult calculations. Then I am saving this all data into new Transformed QVD. Next day I want to do this difficult calculation just for freshest data and add it to Transformed QVD. I know how to accomplish this, just the only problem I am facing is:
What should I do if data has changed (let's say error was fixed, and one of the customer's name got changed) in one of the other tables which is used for transformation of Fact table (Transformed Sales QVD)?
In my Understanding, Before saving the QVD, ur doing joins and calculations and saving it as a Fact QVD.
What i would suggest is an Extraction application where you will just extract all the tables and save it as a QVD then you create another DataModel App where you can do all the Joins and Calculations, this way, you can control which table you want to load completely without affecting other Tables and thus saving the Reload Time.
I am thinking about a three tier structure:
1. Extract (extracting fresh raw fact table data and concatenating it to older raw fact table data)
2. Transform (loading fresh raw fact table data and making transformation and concatenating it to older transformed fact table data)
3. Load (just loading all necessary data to final application)
My question is arising from 2nd step - Transformation.
What should I do if data has changed (let's say error was fixed, and one of the customer's name got changed) in one of the other tables which is used for transformation of Fact table?
Those decisions cannot be made by community members. IMHO they can only be made by people responsible for your business and its data.
For example, if we say: "reload everything and change history to reflect the current situation", then what will you do in case:
There are various other situations imaginable for which tje ETL developer cannot decide what to do with the data. Maybe you can first make an inventory of all changes that you should be able to handle in the (E)T(L) phase, and then discuss the impact of various approaches on the businnes data with your end-users..Since this is mostly a business decision, I would suggest to let them help in deciding what to do..