Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
lucasdavis500
Creator III
Creator III

Incremental Update/Insert with External Data (NO DB Connection)

Hey all, I'm having an issue understanding how I can do an update on data that is already loaded into QVD's and will continue to be loaded every month. The code I have written will perform inserts of new data that I receive, but it's not going to update old records (because the updated data/ new data are in different files) The files I receive are excel (xlsx) and come independent of eachother i.e. each table is a separate excel file, and within each table there are ALSO separate excel files, one being the INSERT (new) records and the other being the UPDATED (old) records. Currently, my process is looking at (for the most part) a field that is a created date, or when that row (observation) would have been 'created' by someone at my work. If the date created is larger than any date created observation already loaded in my QVD, I insert it into the QVD, thus ensuring all new records are present. However, the issue lies with UPDATING the old records. Here is a sample of what the data would look like BEFORE and AFTER it's updated:

data as of June 1st:

Pkey     DateCreated     DateResolved

1234      05/10/2016       05/26/2016

2345      05/15/2016              -

3456      05/31/2016              -

Data as of July 1st:

Pkey     DateCreated     DateResolved

1234      05/10/2016       05/26/2016

2345      05/15/2016       06/01/2016

3456      05/31/2016       06/15/2016

---- there would also be a June data file accompanying this on July 1st, where all of the DateCreated observations would consist of June dates..

As you can see, the data I receive at month end, June 1st for example, will only have partial data, and after a 1 month lag, i.e. July 1st, all of May's data (should) have been updated. The issue is, when this data is loaded in JUNE 1st, the MAX DataCreated for this table is MAY 31st, 2016. When I try to update (since it's not all in one file) in July, for May's data, June PARTIAL data has been loaded as well. This causes the issue, QV fetches 0 lines since all of the old data's DateCreated field isn't > the max DateCreated that was inserted at the beginning of July for JUNE. One way I think I could resolve this is by manually changing the values of vLastExecTime and vExecTime, then creating a new column that reflects the time date that the data was added in, then deleting the oldest record of the data by the Pkey, which would require manual intervention every time, and can lead to human error... and I'm also not 100% sure how I can accomplish this in QV. It makes sense in my head, and I believe the below code would work if this were from a DB and the update/insert all happened at one time.. Any thoughts on this? Below is a sample of the code ( I extracted a lot of the transformations, etc to simplify it.)

//CET Data

IF NOT ISNULL(QVDCreateTime('$(vQVDPath)CETData.qvd')) THEN

LoadTime:

LOAD MAX(DateCreated) AS DateCreated FROM $(vQVDPath)CETData.qvd (QVD) ;

LET vLastExecTime=DATE(PEEK('DateCreated',0,'LoadTime')) ;

DROP TABLE LoadTime ;
END IF ;


CET:
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType

FROM

[$(vPathname)CET_$(vDataDate).xlsx]
(
ooxml, embedded labels, table is Sheet1)

WHERE DateCreated >= $(vLastExecTime) AND DateCreated < $(vExecTime);

IF NOT ISNULL(QVDCREATETIME('$(vQVDPath)CETData.qvd')) THEN

CET_Final:
CONCATENATE(CET)
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType

FROM $(vQVDPath)CETData.qvd (QVD)
WHERE NOT(EXISTS (Pkey)) ;
END IF

IF NoOfRows(CET) > 0 THEN

STORE CET_FINAL INTO $(vQVDPath)CETData.qvd (QVD) ;

//DROP TABLE CET ;
END IF ;

1 Solution

Accepted Solutions
vinieme12
Champion III
Champion III

You are creating a date modified field using the now function! the actual datemodifed in terms of DATA should ideally flow from the from end CRM tool / Software which relates to each data row

In this sense what you are adding is not a modified date for the record, but a date that you refreshed it which is like a dateadded flag.

I understand your logic; you do not want to refresh all of the data by doing this, but have you tried yet? There will not be much difference in reload time.

Which is why I recommend to drop your where clause comparing with your date field.

Just try as below

CET:
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType

FROM

[$(vPathname)CET_$(vDataDate).xlsx]
(
ooxml, embedded labels, table is Sheet1);


IF NOT ISNULL(QVDCREATETIME('$(vQVDPath)CETData.qvd')) THEN

CET_Final:
CONCATENATE(CET)
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType

FROM $(vQVDPath)CETData.qvd (QVD)
WHERE NOT(EXISTS (Pkey)) ;
END IF

IF NoOfRows(CET) > 0 THEN

STORE CET_FINAL INTO $(vQVDPath)CETData.qvd (QVD) ;

//DROP TABLE CET ;
END IF ;

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.

View solution in original post

18 Replies
stabben23
Partner - Master
Partner - Master

Hi Lucas,

your main problem here is that your Pkey wont change even if the DateResolved changes. So when you do your Where Not(EXISTS(Pkey)) it wont find any new transactions, It already there in the qvd.

I'll think that you need to do a full reload sometimes to solve this, maybe during the night. And during the day do some incremental reload to find new Pkey rows.

lucasdavis500
Creator III
Creator III
Author

You're right, it doesn't change, and that's the problem. QV reads my Pkey, sees that it is already in my QVD, and passes it. I can't be the only person with this issue....Doing a Full Reload doesn't seem like it is sustainable, especially since I can only pull this data with a few months lag. Like I said, I can see creating a variable for Execution Time, i.e. when the data was loaded in, and then comparing distinct Pkey's with eachother, and then keeping the most "updated" Pkey..

beck_bakytbek
Master
Master

Hi Lucas,

check this example: https://www.youtube.com/watch?v=BoEhqyy2xbM

i hope that resolves your issue

beck

stabben23
Partner - Master
Partner - Master

Lucas,

I have solve this earlier, I created a Partial load with function "create" where I only load "today" transactions. That was a small number of transactions so I load it every hour. Create works that it reads all transactions again, all of them.

so If I hade a Pkey that have change during the day it will end up in my small "today qvd". After the small Partial Load I concatenated it with the Main qvd in another app, the today qvd must be first loaded when you concatenate so it will find the changed pKey first before you do the Where Not Exists part with the main qvd. Hope this helps!

stabben23
Partner - Master
Partner - Master

I used max date from the main qvd to know where to start my today transaction, like you do here.

LoadTime:

LOAD MAX(DateCreated) AS DateCreated FROM $(vQVDPath)CETData.qvd (QVD) ;

LET vLastExecTime=DATE(PEEK('DateCreated',0,'LoadTime')) ;

DROP TABLE LoadTime ;

Then only use >=vLastExecTime in your where statment

vinieme12
Champion III
Champion III

You really need a MODIFIED_DATE field rather than Created_Date.

If you have a modified date your problems for both new and modified records are solved!

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.
dirk_fischer
Creator II
Creator II

Hi Lucas,

if your primary key is not changing, you can load the file with the updated data before you reload the existing data. While loading the existing data, exclude all datasets already in your table. By this way, you get the updated data and then you simply don't load the older data.

I hope, this helps.

Regards

Dirk

lucasdavis500
Creator III
Creator III
Author

Hi Vineeth, I have created a DateModified field in excel (using the now() function) that I'm using before I import these files. However, it does not seem to be updating old data when I try to read in my updated data, is this because of the WHERE NOT (EXISTS(PKEY)) when trying to concatenate the files to the QVD?

vinieme12
Champion III
Champion III

Nope, a manually created field won't do it has to be specific for each record, reach out to your Database team and ask if this field is available.

I think you can drop the where clause completely from your load script there is no need to check the created date.

Perform load in the following order


LOAD *

FROM NEWDATA_or_UPDATEDDATA;

concatenate

LOAD *

FROM ARCHIVE_QVD

WHERE NOT EXIST(PKEY);

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.