Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hey all, I'm having an issue understanding how I can do an update on data that is already loaded into QVD's and will continue to be loaded every month. The code I have written will perform inserts of new data that I receive, but it's not going to update old records (because the updated data/ new data are in different files) The files I receive are excel (xlsx) and come independent of eachother i.e. each table is a separate excel file, and within each table there are ALSO separate excel files, one being the INSERT (new) records and the other being the UPDATED (old) records. Currently, my process is looking at (for the most part) a field that is a created date, or when that row (observation) would have been 'created' by someone at my work. If the date created is larger than any date created observation already loaded in my QVD, I insert it into the QVD, thus ensuring all new records are present. However, the issue lies with UPDATING the old records. Here is a sample of what the data would look like BEFORE and AFTER it's updated:
data as of June 1st:
Pkey DateCreated DateResolved
1234 05/10/2016 05/26/2016
2345 05/15/2016 -
3456 05/31/2016 -
Data as of July 1st:
Pkey DateCreated DateResolved
1234 05/10/2016 05/26/2016
2345 05/15/2016 06/01/2016
3456 05/31/2016 06/15/2016
---- there would also be a June data file accompanying this on July 1st, where all of the DateCreated observations would consist of June dates..
As you can see, the data I receive at month end, June 1st for example, will only have partial data, and after a 1 month lag, i.e. July 1st, all of May's data (should) have been updated. The issue is, when this data is loaded in JUNE 1st, the MAX DataCreated for this table is MAY 31st, 2016. When I try to update (since it's not all in one file) in July, for May's data, June PARTIAL data has been loaded as well. This causes the issue, QV fetches 0 lines since all of the old data's DateCreated field isn't > the max DateCreated that was inserted at the beginning of July for JUNE. One way I think I could resolve this is by manually changing the values of vLastExecTime and vExecTime, then creating a new column that reflects the time date that the data was added in, then deleting the oldest record of the data by the Pkey, which would require manual intervention every time, and can lead to human error... and I'm also not 100% sure how I can accomplish this in QV. It makes sense in my head, and I believe the below code would work if this were from a DB and the update/insert all happened at one time.. Any thoughts on this? Below is a sample of the code ( I extracted a lot of the transformations, etc to simplify it.)
//CET Data
IF NOT ISNULL(QVDCreateTime('$(vQVDPath)CETData.qvd')) THEN
LoadTime:
LOAD MAX(DateCreated) AS DateCreated FROM $(vQVDPath)CETData.qvd (QVD) ;
LET vLastExecTime=DATE(PEEK('DateCreated',0,'LoadTime')) ;
DROP TABLE LoadTime ;
END IF ;
CET:
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType
FROM
[$(vPathname)CET_$(vDataDate).xlsx]
(ooxml, embedded labels, table is Sheet1)
WHERE DateCreated >= $(vLastExecTime) AND DateCreated < $(vExecTime);
IF NOT ISNULL(QVDCREATETIME('$(vQVDPath)CETData.qvd')) THEN
CET_Final:
CONCATENATE(CET)
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType
FROM $(vQVDPath)CETData.qvd (QVD)
WHERE NOT(EXISTS (Pkey)) ;
END IF
IF NoOfRows(CET) > 0 THEN
STORE CET_FINAL INTO $(vQVDPath)CETData.qvd (QVD) ;
//DROP TABLE CET ;
END IF ;
You are creating a date modified field using the now function! the actual datemodifed in terms of DATA should ideally flow from the from end CRM tool / Software which relates to each data row
In this sense what you are adding is not a modified date for the record, but a date that you refreshed it which is like a dateadded flag.
I understand your logic; you do not want to refresh all of the data by doing this, but have you tried yet? There will not be much difference in reload time.
Which is why I recommend to drop your where clause comparing with your date field.
Just try as below
CET:
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType
FROM
[$(vPathname)CET_$(vDataDate).xlsx]
(ooxml, embedded labels, table is Sheet1);
IF NOT ISNULL(QVDCREATETIME('$(vQVDPath)CETData.qvd')) THEN
CET_Final:
CONCATENATE(CET)
LOAD
Pkey,
State,
AccountType,
DisputeType,
TaxID,
Mode,
AssignedTo,
ResolvedBy,
DateResolved,
DATE(DateCreated) as DateCreated,
ProductLine,
CreditDisputeType
FROM $(vQVDPath)CETData.qvd (QVD)
WHERE NOT(EXISTS (Pkey)) ;
END IF
IF NoOfRows(CET) > 0 THEN
STORE CET_FINAL INTO $(vQVDPath)CETData.qvd (QVD) ;
//DROP TABLE CET ;
END IF ;
Hi Lucas,
your main problem here is that your Pkey wont change even if the DateResolved changes. So when you do your Where Not(EXISTS(Pkey)) it wont find any new transactions, It already there in the qvd.
I'll think that you need to do a full reload sometimes to solve this, maybe during the night. And during the day do some incremental reload to find new Pkey rows.
You're right, it doesn't change, and that's the problem. QV reads my Pkey, sees that it is already in my QVD, and passes it. I can't be the only person with this issue....Doing a Full Reload doesn't seem like it is sustainable, especially since I can only pull this data with a few months lag. Like I said, I can see creating a variable for Execution Time, i.e. when the data was loaded in, and then comparing distinct Pkey's with eachother, and then keeping the most "updated" Pkey..
Hi Lucas,
check this example: https://www.youtube.com/watch?v=BoEhqyy2xbM
i hope that resolves your issue
beck
Lucas,
I have solve this earlier, I created a Partial load with function "create" where I only load "today" transactions. That was a small number of transactions so I load it every hour. Create works that it reads all transactions again, all of them.
so If I hade a Pkey that have change during the day it will end up in my small "today qvd". After the small Partial Load I concatenated it with the Main qvd in another app, the today qvd must be first loaded when you concatenate so it will find the changed pKey first before you do the Where Not Exists part with the main qvd. Hope this helps!
I used max date from the main qvd to know where to start my today transaction, like you do here.
LoadTime:
LOAD MAX(DateCreated) AS DateCreated FROM $(vQVDPath)CETData.qvd (QVD) ;
LET vLastExecTime=DATE(PEEK('DateCreated',0,'LoadTime')) ;
DROP TABLE LoadTime ;
Then only use >=vLastExecTime in your where statment
You really need a MODIFIED_DATE field rather than Created_Date.
If you have a modified date your problems for both new and modified records are solved!
Hi Lucas,
if your primary key is not changing, you can load the file with the updated data before you reload the existing data. While loading the existing data, exclude all datasets already in your table. By this way, you get the updated data and then you simply don't load the older data.
I hope, this helps.
Regards
Dirk
Hi Vineeth, I have created a DateModified field in excel (using the now() function) that I'm using before I import these files. However, it does not seem to be updating old data when I try to read in my updated data, is this because of the WHERE NOT (EXISTS(PKEY)) when trying to concatenate the files to the QVD?
Nope, a manually created field won't do it has to be specific for each record, reach out to your Database team and ask if this field is available.
I think you can drop the where clause completely from your load script there is no need to check the created date.
Perform load in the following order
LOAD *
FROM NEWDATA_or_UPDATEDDATA;
concatenate
LOAD *
FROM ARCHIVE_QVD
WHERE NOT EXIST(PKEY);