Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hey all, I'm having an issue understanding how I can do an update on data that is already loaded into QVD's and will continue to be loaded every month. The code I have written will perform inserts of new data that I receive, but it's not going to update old records (because the updated data/ new data are in different files) The files I receive are excel (xlsx) and come independent of eachother i.e. each table is a separate excel file, and within each table there are ALSO separate excel files, one being the INSERT (new) records and the other being the UPDATED (old) records. Currently, my process is looking at (for the most part) a field that is a created date, or when that row (observation) would have been 'created' by someone at my work. If the date created is larger than any date created observation already loaded in my QVD, I insert it into the QVD, thus ensuring all new records are present. However, the issue lies with UPDATING the old records. Here is a sample of what the data would look like BEFORE and AFTER it's updated:
data as of June 1st:
Pkey DateCreated DateResolved
1234 05/10/2016 05/26/2016
2345 05/15/2016 -
3456 05/31/2016 -
Data as of July 1st:
Pkey DateCreated DateResolved
1234 05/10/2016 05/26/2016
2345 05/15/2016 06/01/2016
3456 05/31/2016 06/15/2016
---- there would also be a June data file accompanying this on July 1st, where all of the DateCreated observations would consist of June dates..
As you can see, the data I receive at month end, June 1st for example, will only have partial data, and after a 1 month lag, i.e. July 1st, all of May's data (should) have been updated. The issue is, when this data is loaded in JUNE 1st, the MAX DataCreated for this table is MAY 31st, 2016. When I try to update (since it's not all in one file) in July, for May's data, June PARTIAL data has been loaded as well. This causes the issue, QV fetches 0 lines since all of the old data's DateCreated field isn't > the max DateCreated that was inserted at the beginning of July for JUNE. One way I think I could resolve this is by manually changing the values of vLastExecTime and vExecTime, then creating a new column that reflects the time date that the data was added in, then deleting the oldest record of the data by the Pkey, which would require manual intervention every time, and can lead to human error... and I'm also not 100% sure how I can accomplish this in QV. It makes sense in my head, and I believe the below code would work if this were from a DB and the update/insert all happened at one time.. Any thoughts on this? Below is a sample of the code ( I extracted a lot of the transformations, etc to simplify it.)
//CET Data
IF NOT ISNULL(QVDCreateTime('$(vQVDPath)CETData.qvd')) THEN
LoadTime:
LOAD MAX(DateCreated) AS DateCreated FROM $(vQVDPath)CETData.qvd (QVD) ;
LET vLastExecTime=DATE(PEEK('DateCreated',0,'LoadTime')) ;
DROP TABLE LoadTime ;
END IF ;
CET:
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType
FROM
[$(vPathname)CET_$(vDataDate).xlsx]
(ooxml, embedded labels, table is Sheet1)
WHERE DateCreated >= $(vLastExecTime) AND DateCreated < $(vExecTime);
IF NOT ISNULL(QVDCREATETIME('$(vQVDPath)CETData.qvd')) THEN
CET_Final:
CONCATENATE(CET)
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType
FROM $(vQVDPath)CETData.qvd (QVD)
WHERE NOT(EXISTS (Pkey)) ;
END IF
IF NoOfRows(CET) > 0 THEN
STORE CET_FINAL INTO $(vQVDPath)CETData.qvd (QVD) ;
//DROP TABLE CET ;
END IF ;
This cannot be true. You can manually create a DateModified field as long as it is greater than my previous DateModified for the same data, it will overwrite it. I'm doing a "full reload" of the previous month essentially, overwriting data that has changed and has not changed, but not a "full reload" in the sense that I'm not wiping my QVD. My QVD will still have previous months data, just a 1 month lag should get a full wipe and reload. Does this make sense?
You are creating a date modified field using the now function! the actual datemodifed in terms of DATA should ideally flow from the from end CRM tool / Software which relates to each data row
In this sense what you are adding is not a modified date for the record, but a date that you refreshed it which is like a dateadded flag.
I understand your logic; you do not want to refresh all of the data by doing this, but have you tried yet? There will not be much difference in reload time.
Which is why I recommend to drop your where clause comparing with your date field.
Just try as below
CET:
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType
FROM
[$(vPathname)CET_$(vDataDate).xlsx]
(ooxml, embedded labels, table is Sheet1);
IF NOT ISNULL(QVDCREATETIME('$(vQVDPath)CETData.qvd')) THEN
CET_Final:
CONCATENATE(CET)
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType
FROM $(vQVDPath)CETData.qvd (QVD)
WHERE NOT(EXISTS (Pkey)) ;
END IF
IF NoOfRows(CET) > 0 THEN
STORE CET_FINAL INTO $(vQVDPath)CETData.qvd (QVD) ;
//DROP TABLE CET ;
END IF ;
Vineeth, the artificial DateModified field isn't really an option in my case. I'll explain. The data I use comes from a Credit Bureau, I cannot force an outside source to create a "Date Modified" field. The data I get from the CB is updated every time a "resolution" has been determined regarding one of our customers. There is a time frame in which these updates must be made, therefore I know when I can pull and essentially get "all" of the stages of this customers lifecycle with the CB, but I only care about the most recent updates made. Our option is to pull from the CB multiple times a month, for the previous month, in order to capture the lifecycle.
I.E.
Customer DateCr DateRes
123 9/15/2016 - >>>>> ON 10/1
123 9/15/2016 - >>>>> ON 10/4
123 9/15/2016 10/14/2016 >>>>> ON 10/14
I don't think you understand my scenario fully. What I'm basically doing by adding my own DateModified field is essentially wiping everything from the last 30 days and reloading it.
Please explain the issues you see arising from me synthetically creating this DateModified so I can assess..
I guess what I'm saying, is the only way to get "DateModified", a true date, would be if ALL THREE of the major CB agencies changed their reporting tool, which I highly doubt they would do for me, so I have to create one after the fact....
The reason I cannot do a full reload is because this data is not stored for long periods of time, so unless I manually go into excel, append my data for a "historical" file, then keep reloading, this won't work. Right now, I wouldn't be able to pull January data for my report, I can only go back a few months.... I don't see how your code would "Update" old records of mine. The method you chose would not update a record that was added in my October 1st pull, but updated on my 10/10 pull. (Again, No DateModified field here, file would be the same as the first pull with maybe a value added into one of the fields that had no previous value) It would look at the Pkey and say "HEY, We've got one of those, we don't need a new one", unless I had some Date field that told QV, "Hey, I know we've got one of those, but he needs to be replaced because he's old". So I must tell QV to replace regardless if he was updated or not, because it's the "Latest" data....
Can you provide a sample in Excel that replicates your scenario?
Like Dirk said, discard the date logic all together.
1. Load the entire new monthly xlsx file.
2 Load the QVD with "where not exists(Pkey)".
Your original script as posted should work as-is if you just remove the WHERE clause from the load of the xlsx file.
-Rob
Today, what Vineeth had said, and what you have said, just hit me. I understand why you say that I don't need the where clause...
I had not noticed Vineeth's reply. It's quite correct, and I hope you mark it as such.
-Rob