Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
lucasdavis500
Creator III
Creator III

Incremental Update/Insert with External Data (NO DB Connection)

Hey all, I'm having an issue understanding how I can do an update on data that is already loaded into QVD's and will continue to be loaded every month. The code I have written will perform inserts of new data that I receive, but it's not going to update old records (because the updated data/ new data are in different files) The files I receive are excel (xlsx) and come independent of eachother i.e. each table is a separate excel file, and within each table there are ALSO separate excel files, one being the INSERT (new) records and the other being the UPDATED (old) records. Currently, my process is looking at (for the most part) a field that is a created date, or when that row (observation) would have been 'created' by someone at my work. If the date created is larger than any date created observation already loaded in my QVD, I insert it into the QVD, thus ensuring all new records are present. However, the issue lies with UPDATING the old records. Here is a sample of what the data would look like BEFORE and AFTER it's updated:

data as of June 1st:

Pkey     DateCreated     DateResolved

1234      05/10/2016       05/26/2016

2345      05/15/2016              -

3456      05/31/2016              -

Data as of July 1st:

Pkey     DateCreated     DateResolved

1234      05/10/2016       05/26/2016

2345      05/15/2016       06/01/2016

3456      05/31/2016       06/15/2016

---- there would also be a June data file accompanying this on July 1st, where all of the DateCreated observations would consist of June dates..

As you can see, the data I receive at month end, June 1st for example, will only have partial data, and after a 1 month lag, i.e. July 1st, all of May's data (should) have been updated. The issue is, when this data is loaded in JUNE 1st, the MAX DataCreated for this table is MAY 31st, 2016. When I try to update (since it's not all in one file) in July, for May's data, June PARTIAL data has been loaded as well. This causes the issue, QV fetches 0 lines since all of the old data's DateCreated field isn't > the max DateCreated that was inserted at the beginning of July for JUNE. One way I think I could resolve this is by manually changing the values of vLastExecTime and vExecTime, then creating a new column that reflects the time date that the data was added in, then deleting the oldest record of the data by the Pkey, which would require manual intervention every time, and can lead to human error... and I'm also not 100% sure how I can accomplish this in QV. It makes sense in my head, and I believe the below code would work if this were from a DB and the update/insert all happened at one time.. Any thoughts on this? Below is a sample of the code ( I extracted a lot of the transformations, etc to simplify it.)

//CET Data

IF NOT ISNULL(QVDCreateTime('$(vQVDPath)CETData.qvd')) THEN

LoadTime:

LOAD MAX(DateCreated) AS DateCreated FROM $(vQVDPath)CETData.qvd (QVD) ;

LET vLastExecTime=DATE(PEEK('DateCreated',0,'LoadTime')) ;

DROP TABLE LoadTime ;
END IF ;


CET:
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType

FROM

[$(vPathname)CET_$(vDataDate).xlsx]
(
ooxml, embedded labels, table is Sheet1)

WHERE DateCreated >= $(vLastExecTime) AND DateCreated < $(vExecTime);

IF NOT ISNULL(QVDCREATETIME('$(vQVDPath)CETData.qvd')) THEN

CET_Final:
CONCATENATE(CET)
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType

FROM $(vQVDPath)CETData.qvd (QVD)
WHERE NOT(EXISTS (Pkey)) ;
END IF

IF NoOfRows(CET) > 0 THEN

STORE CET_FINAL INTO $(vQVDPath)CETData.qvd (QVD) ;

//DROP TABLE CET ;
END IF ;

18 Replies
lucasdavis500
Creator III
Creator III
Author

This cannot be true. You can manually create a DateModified field as long as it is greater than my previous DateModified for the same data, it will overwrite it. I'm doing a "full reload" of the previous month essentially, overwriting data that has changed and has not changed, but not a "full reload" in the sense that I'm not wiping my QVD. My QVD will still have previous months data, just a 1 month lag should get a full wipe and reload. Does this make sense?

vinieme12
Champion III
Champion III

You are creating a date modified field using the now function! the actual datemodifed in terms of DATA should ideally flow from the from end CRM tool / Software which relates to each data row

In this sense what you are adding is not a modified date for the record, but a date that you refreshed it which is like a dateadded flag.

I understand your logic; you do not want to refresh all of the data by doing this, but have you tried yet? There will not be much difference in reload time.

Which is why I recommend to drop your where clause comparing with your date field.

Just try as below

CET:
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType

FROM

[$(vPathname)CET_$(vDataDate).xlsx]
(
ooxml, embedded labels, table is Sheet1);


IF NOT ISNULL(QVDCREATETIME('$(vQVDPath)CETData.qvd')) THEN

CET_Final:
CONCATENATE(CET)
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType

FROM $(vQVDPath)CETData.qvd (QVD)
WHERE NOT(EXISTS (Pkey)) ;
END IF

IF NoOfRows(CET) > 0 THEN

STORE CET_FINAL INTO $(vQVDPath)CETData.qvd (QVD) ;

//DROP TABLE CET ;
END IF ;

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.
lucasdavis500
Creator III
Creator III
Author

Vineeth, the artificial DateModified field isn't really an option in my case. I'll explain. The data I use comes from a Credit Bureau, I cannot force an outside source to create a "Date Modified" field. The data I get from the CB is updated every time a "resolution" has been determined regarding one of our customers. There is a time frame in which these updates must be made, therefore I know when I can pull and essentially get "all" of the stages of this customers lifecycle with the CB, but I only care about the most recent updates made. Our option is to pull from the CB multiple times a month, for the previous month, in order to capture the lifecycle.

I.E.

Customer        DateCr         DateRes

123                 9/15/2016           -                                         >>>>> ON 10/1

123                 9/15/2016           -                                         >>>>> ON 10/4

123                 9/15/2016      10/14/2016                             >>>>> ON 10/14

I don't think you understand my scenario fully. What I'm basically doing by adding my own DateModified field is essentially wiping everything from the last 30 days and reloading it.

Please explain the issues you see arising from me synthetically creating this DateModified so I can assess..

lucasdavis500
Creator III
Creator III
Author

I guess what I'm saying, is the only way to get "DateModified", a true date, would be if ALL THREE of the major CB agencies changed their reporting tool, which I highly doubt they would do for me, so I have to create one after the fact....

lucasdavis500
Creator III
Creator III
Author

The reason I cannot do a full reload is because this data is not stored for long periods of time, so unless I manually go into excel, append my data for a "historical" file, then keep reloading, this won't work. Right now, I wouldn't be able to pull January data for my report, I can only go back a few months.... I don't see how your code would "Update" old records of mine. The method you chose would not update a record that was added in my October 1st pull, but updated on my 10/10 pull. (Again, No DateModified field here, file would be the same as the first pull with maybe a value added into one of the fields that had no previous value) It would look at the Pkey and say "HEY, We've got one of those, we don't need a new one", unless I had some Date field that told QV, "Hey, I know we've got one of those, but he needs to be replaced because he's old". So I must tell QV to replace regardless if he was updated or not, because it's the "Latest" data....

vinieme12
Champion III
Champion III

Can you provide a sample in Excel that replicates your scenario?

Vineeth Pujari
If a post helps to resolve your issue, please accept it as a Solution.
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Like Dirk said, discard the date logic all together. 

1. Load the entire new monthly xlsx file.

2 Load the QVD with "where not exists(Pkey)".

Your original script as posted should work as-is if you just remove the WHERE clause from the load of the xlsx file.

-Rob

lucasdavis500
Creator III
Creator III
Author

Today, what Vineeth had said, and what you have said, just hit me. I understand why you say that I don't need the where clause...

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I had not noticed Vineeth's reply. It's quite correct, and I hope you mark it as such.

-Rob