Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Andrius45
Contributor III
Contributor III

Stuck on QlikView ETL process

Hi all, I am drafting an ETL structure for one of our apps and faced one unsolvable question. Maybe you guys, can share your knowledge and help me with it.

We have one pretty big app with data from way back 2010. Every night I would like to do transformation just for freshest data (the one, that was modified during last 24 hours), but I don't know what to do when - historical data of one of dimmensional tables has changed. Should I reload all fact table or is there an easier way?

For example:

1. During Extract stage I add freshest data about sales from last 24 hours to main Raw_QVD.

2. During Transform stage I modify the freshest data from last 24 hours a by applying applymaps and various joins (let's say I join info about sales person). Then add this data to historical Transformed_QVD

3. During Load stage I load this data to final .qvw application.

What should I do if data about one sales person has changed (changed his name)? Should I reload all historical Transformed_QVD data? Or is there any other way?

7 Replies
vkish16161
Creator III
Creator III

Andrius45
Contributor III
Contributor III
Author

Thanks Vishnu, as I understand this is for Extration step and I am using this technology. The problem I am facing is , what to do when on Transformation step you got one of your dimmension tables with changed info.


Let me illustrate this problem again with this example:

1. Extract step. To your historical raw data you append only new rows of data or the ones that got it's info changed. (For this step - Incremental load)

2. Transform step.  Instead of transforming all raw data, It's smart to only transform data that has been added in during previous step. So you only transform that one and add it to your historical transformed data.

Question is arising in this step. What to do when one of the dimensional tables, which is used in transformation (LEFT JOIN) has changed, let's say error was fixed, and one of the customer's name got changed. This change have to impact not only new data, but also old one. How to manage it? Do we have to apply transformation for all raw data?

amithmurali
Partner - Creator II
Partner - Creator II

As Vishnu Told earlier, Incremental Load is the Solution to this, The approach wil be as follows.

You have to Load all the data since 2010 for one time in this approach, Another thing you have to keep in mind is that, any changes done to the historical Data will not be updated in the Qlik as it will take only the last day data, to over come this issue you can create another condition in the Incremental reload where you can put it like this (Lets assume the QVD name is Sales.QVD) check for the sales.qvd is present or not, If qvd is present then load last 24hours data and concatenate with the QVD data and overwrite the old QVD else if the Qvd is missing then load all the data.

Andrius45
Contributor III
Contributor III
Author

Hi Amith,

     Let's say I am transforming this Sales QVD data by applying a lot of Joins, ApplyMaps and difficult calculations. Then I am saving this all data into new Transformed QVD. Next day I want to do this difficult calculation just for freshest data and add it to Transformed QVD. I know how to accomplish this, just the only problem I am facing is:

     What should I do if data has changed (let's say error was fixed, and one of the customer's name got changed) in one of the other tables which is used for transformation of Fact table (Transformed Sales QVD)?

amithmurali
Partner - Creator II
Partner - Creator II

In my Understanding, Before saving the QVD, ur doing joins and calculations and saving it as a Fact QVD.

What i would suggest is an Extraction application where you will just extract all the tables and save it as a QVD then you create another DataModel App where you can do all the Joins and Calculations, this way, you can control which table you want to load completely without affecting other Tables and thus saving the Reload Time.

Andrius45
Contributor III
Contributor III
Author

I am thinking about a three tier structure:

1. Extract (extracting fresh raw fact table data and concatenating it to older raw fact table data)

2. Transform (loading fresh raw fact table data and making transformation and concatenating it to older transformed fact table data)

3. Load (just loading all necessary data to final application)

My question is arising from 2nd step - Transformation.

What should I do if data has changed (let's say error was fixed, and one of the customer's name got changed) in one of the other tables which is used for transformation of Fact table?

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Those decisions cannot be made by community members. IMHO they can only be made by people responsible for your business and its data.

For example, if we say: "reload everything and change history to reflect the current situation", then what will you do in case:

  • A correction was made to an ID of a customer (instead of the customer name) and this ID ties into various places of your historical data. Or conflicts with the same ID that was used for a different customer in your historical data.
  • A correction was made of a different kind: one Sales Executive left the company and was replaced by a new Sales Executive. Are you going to turn over all historical sales to the new executive?
  • A correction was made so that a single Division now becomes two Divisions. How are you going to handle the split for historical data when related dimensions cannot be split or cause conflicts..

There are various other situations imaginable for which tje ETL developer cannot decide what to do with the data. Maybe you can first make an inventory of all changes that you should be able to handle in the (E)T(L) phase, and then discuss the impact of various approaches on the businnes data with your end-users..Since this is mostly a business decision, I would suggest to let them help in deciding what to do..