Solved: Re: Incremental Load on Transactional Tables - Qlik Community

Badzreyes00 · ‎2020-09-18

Hi Everyone,

In my dashboard development process, I split them into 3 apps:

Raw Extracts - Directly pull each tables in the database and load as QVD

Transform - transformation of tables and metrics

Presentation - Dashboard front end development

I have a transactional table that greatly increases everyday and now it tooks me 6hours to load the entire table - abc_bill

Can you please guide me or do you have an example for those who are doing an incremental load? basically from now on I want to only reload new data for abc_bill where data_create_time > *current max(data_create_time)*

I read some incremental load instructions but I am not sure if I understand it correctly but it seems that it is only applicable on .txt files.

Thank you so much in advance!!!

stevedark · ‎2020-09-22

Hi @Badzreyes00

Incremental loads can work on any data source, and in fact are much better on databases as correct indexing means that pulling things based on a date are much quicker (provided the indexes are correct).

Loading the max date from a very large table may in itself be a slow process, so you might want to persist to a variable the last time a load was run. You do need to be very careful that time differences between the Qlik server and the database server do not scupper you. Grabbing the max date from the database (using a SELECT Max) will be much quicker than parsing your old data, but you need to be aware of the risk of a new record being posted after getting the max value and loading the data. If you have a unique ID I would always pull more data, so that it intentionally overlaps and then use a WHERE EXISTS to remove dupes.

I've written a blog post on incremental load which should hopefully give you some useful pointers:

https://www.quickintelligence.co.uk/qlikview-incremental-load/

Hope that is useful.

Steve

View solution in original post

stevedark · ‎2020-09-22

Hi @Badzreyes00

Incremental loads can work on any data source, and in fact are much better on databases as correct indexing means that pulling things based on a date are much quicker (provided the indexes are correct).

Loading the max date from a very large table may in itself be a slow process, so you might want to persist to a variable the last time a load was run. You do need to be very careful that time differences between the Qlik server and the database server do not scupper you. Grabbing the max date from the database (using a SELECT Max) will be much quicker than parsing your old data, but you need to be aware of the risk of a new record being posted after getting the max value and loading the data. If you have a unique ID I would always pull more data, so that it intentionally overlaps and then use a WHERE EXISTS to remove dupes.

I've written a blog post on incremental load which should hopefully give you some useful pointers:

https://www.quickintelligence.co.uk/qlikview-incremental-load/

Hope that is useful.

Steve

Badzreyes00 · ‎2020-09-23

Hi Steve,

Thank you so much for your input, I really appreciate it and you got a great blog there - will explore all those vids some time. To clarify the steps I need to do are:

1. Edit the raw extractor abc_bill and create a variable that stores max(data_create_time)

2. Create new load script with where clause for >($VarMaxDate) and store in QVDs with abc_bill_$(VarMaxDate).qvd

3. Create new QVD that will load the "base QVD" and concatenate it by loading the QVD using abc_bill_*.qvd and save it as new QVD again to be the main Raw QVD file that I can now use to my transformation apps?

Regards,

Badzreyes

stevedark · ‎2020-09-23

Hi @Badzreyes00

Thank you for the kind comments on my blog, I try to cover topics that people will find interesting.

If you have a transaction log style table (i.e. data is appended and date-stamped with no back-posting) my preferred incremental approach is to create a QVD of all data before a specified date with a date suffix on it of where the data is up to (_PreSep20).

Then create a process that loads just a single days worth of data and stores that to a QVD with a date suffix of that day storing it with the date suffix for that day (_20200901). You could build a loop in the load to do today and also the past n days. Make sure that the dates in your 'up to' QVD do not overlap with your dailies.

You can then reload the current day QVD multiple times through the day, storing it over the top of the previous run of that QVD, when you get into the next day a new QVD is started.

As you say you can reload with a * in the file name, or a for each vFIle in Filelist.

You obviously need to ensure that you don't miss any transactions around midnight, so you might load the previous day until 6am the following day, or load n number of days so you catch any back-posted transactions.

The amount of data you have and how quick you want it to go will dictate how many days you overlap with the refresh.

Hope that all makes sense?

Cheers,
Steve