incremental load with no date or timestamp field - Qlik Community

Report Inappropriate Content · ‎2012-04-16

Hello,

I have a few tables, which do not have the date/timestamp field to show when was the row modified. the tables are connected through other unique fields. What are the recommended ways to do incremental loads. If the unique fields are strings, then in my opinion, the incremental load will be very slow if we are searching for every new key in the old records to check whether its a new records or old one.

What are the recommended ways to do incremental load in this scenario

Thanks

Arif

Report Inappropriate Content · ‎2012-04-16

Hi,

I've had the same issue and unfortunately the only option I've come up with is to use where not exists(unique_field). It's not ideal, and as you say, not the fastest method. Have you tried looking for a log or history table that may hold a date/time stamp separately? Are there going to be updates to existing lines or are you just looking for new lines of data? If it's just new lines then you may have an incremental numerical ID field which you could use in place of the date/time stamp.

Sorry, not much help. If anyone else can supply an alternative method then I'm sure we're not the only one's looking for it!

Cheers,

Emma

Report Inappropriate Content · ‎2012-04-17

Hello Emma,

Thanks for your response. I was also not able to find any other solution that not exists. I have one question for it though. How do i store the unique values from qvd into a variable so that it can be used in

not exists(id1, id2) etc in sql select statement in qvd. I think storing the list of ids in a string is a bad idea, because there could be a limit to how much ids can a string variable store.or there is no limit to it? What are the possible ways of using list of ids from qvd in the not exists clause of select statement inside load script. What are the other possible options?

Thanks

Syed

Report Inappropriate Content · ‎2012-04-17

Hi,

How many fields need to be used to make a line of data unique?

If it is just one, your load statement would be something like:

Table:

Load

uniqueid,

data1,

data2

From

PreviousDownload.qvd;

Concatenate(Table)

Load

uniqueid,

data1,

data2

From

NewData.qvd // or from the database

where not exists(uniqueid);

If your data fields are likely to change rather than just be added, then I'm not sure this is going to work for you though, so make sure you do some thorough checking. For example, I used to run a file looking at purchase orders. To begin with we couldn't find a way other than doing a full download of the order table each week because there was no identifier to say when a line was updated. Even closed orders could be re-opened so we had nothing to work from. In the end, we managed to find a database expert who managed to identify a log table in the background that had a date/time stamp for changes to lines and we managed to do something with that. But it was quite complex.

Hope you find something that works!

Cheers,

Emma

Report Inappropriate Content · ‎2012-04-18

Hello Emma,

Thanks for your response. I have another table called timesheet(timesheet_id,payroll_id,processed_date,.....) which is connected with this payroll table (payroll_id,status,total_hours,......) through payroll id.

The timesheet table is incrementally loaded using processed_date. Now I want to load the payroll table which has only unique payroll_id but no modification date. Can we use the timesheet table to do incremental loading on payroll table?

Also, In the above method that you mentioned for loading, how can we use load and sql select to use the 'where not exists' clause because the sql is giving error on it

for example,

payroll:

load

payroll_id,

status,

total_hourse

from

payroll.qvd(qvd);

concatenate(payroll)

load

payroll_id,

status,

total_hours;

sql select id as payroll_id,

status,

total_hours

from dbo.payroll

where not exists(payroll_id);

Report Inappropriate Content · ‎2012-04-20

Hi,

So, just checking I'm getting this right, you want to load new information from the payroll table only where there has been an increase in hours - i.e. there have been timesheets processed since the last load?

If you can do this presumably that means you don't have to try and use the where not exists ... so I'll tackle this one first! 🙂

presumably your timesheet load is something like

varLastUploadDate = max(processed_date)

timesheet:

Load

timesheetid,

payrollid,

processed_date

From

dbo.timesheet

where processed_date > varLastUploadDate;

using the same variable you could do something like this:

payroll:

Load

p.payrollid,

p.status,

p.total_hours

From

dbo.payroll p

inner join dbo.timesheet t on p.payrollid = t.payrollid

where processed_date > varLastUploadDate;

concatenate(payroll)

Load

payrollid,

status,

total_hours

From

payroll.qvd (qvd)

where not exists(payrollid);

So essentially you're using the processed date of the timesheet to identify what payrollids have been updated.

Hope this helps!

Cheers,

Emma