Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have a few tables, which do not have the date/timestamp field to show when was the row modified. the tables are connected through other unique fields. What are the recommended ways to do incremental loads. If the unique fields are strings, then in my opinion, the incremental load will be very slow if we are searching for every new key in the old records to check whether its a new records or old one.
What are the recommended ways to do incremental load in this scenario
Thanks
Arif
Hi,
I've had the same issue and unfortunately the only option I've come up with is to use where not exists(unique_field). It's not ideal, and as you say, not the fastest method. Have you tried looking for a log or history table that may hold a date/time stamp separately? Are there going to be updates to existing lines or are you just looking for new lines of data? If it's just new lines then you may have an incremental numerical ID field which you could use in place of the date/time stamp.
Sorry, not much help. If anyone else can supply an alternative method then I'm sure we're not the only one's looking for it!
Cheers,
Emma
Hello Emma,
Thanks for your response. I was also not able to find any other solution that not exists. I have one question for it though. How do i store the unique values from qvd into a variable so that it can be used in
not exists(id1, id2) etc in sql select statement in qvd. I think storing the list of ids in a string is a bad idea, because there could be a limit to how much ids can a string variable store.or there is no limit to it? What are the possible ways of using list of ids from qvd in the not exists clause of select statement inside load script. What are the other possible options?
Thanks
Syed
Hi,
How many fields need to be used to make a line of data unique?
If it is just one, your load statement would be something like:
Table:
Load
uniqueid,
data1,
data2
From
PreviousDownload.qvd;
Concatenate(Table)
Load
uniqueid,
data1,
data2
From
NewData.qvd // or from the database
where not exists(uniqueid);
If your data fields are likely to change rather than just be added, then I'm not sure this is going to work for you though, so make sure you do some thorough checking. For example, I used to run a file looking at purchase orders. To begin with we couldn't find a way other than doing a full download of the order table each week because there was no identifier to say when a line was updated. Even closed orders could be re-opened so we had nothing to work from. In the end, we managed to find a database expert who managed to identify a log table in the background that had a date/time stamp for changes to lines and we managed to do something with that. But it was quite complex.
Hope you find something that works!
Cheers,
Emma
Hello Emma,
Thanks for your response. I have another table called timesheet(timesheet_id,payroll_id,processed_date,.....) which is connected with this payroll table (payroll_id,status,total_hours,......) through payroll id.
The timesheet table is incrementally loaded using processed_date. Now I want to load the payroll table which has only unique payroll_id but no modification date. Can we use the timesheet table to do incremental loading on payroll table?
Also, In the above method that you mentioned for loading, how can we use load and sql select to use the 'where not exists' clause because the sql is giving error on it
for example,
payroll:
load
payroll_id,
status,
total_hourse
from
payroll.qvd(qvd);
concatenate(payroll)
load
payroll_id,
status,
total_hours;
sql select id as payroll_id,
status,
total_hours
from dbo.payroll
where not exists(payroll_id);
Hi,
So, just checking I'm getting this right, you want to load new information from the payroll table only where there has been an increase in hours - i.e. there have been timesheets processed since the last load?
If you can do this presumably that means you don't have to try and use the where not exists ... so I'll tackle this one first! 🙂
presumably your timesheet load is something like
varLastUploadDate = max(processed_date)
timesheet:
Load
timesheetid,
payrollid,
processed_date
From
dbo.timesheet
where processed_date > varLastUploadDate;
using the same variable you could do something like this:
payroll:
Load
p.payrollid,
p.status,
p.total_hours
From
dbo.payroll p
inner join dbo.timesheet t on p.payrollid = t.payrollid
where processed_date > varLastUploadDate;
concatenate(payroll)
Load
payrollid,
status,
total_hours
From
payroll.qvd (qvd)
where not exists(payrollid);
So essentially you're using the processed date of the timesheet to identify what payrollids have been updated.
Hope this helps!
Cheers,
Emma