Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hi all,
Trying to find the quickest way to do incremental load from a set of xml files.. the files have the names include date string such as 2015-09-16 and so on. once the file is received and stored into the directory, it will never change and we receive a new file every night..
sample of files names:
2015-09-10-Data.xml
2015-09-11-Data.xml
2015-09-12-Data.xml
I thought about storing the data into qvd then load next file but not sure how to do that..
any idea?
Thxs,
Alec
I assume that you have a load statement that can load the XML files.
Then use something like this
// Get the last loaded date
T_MaxDate:
Load Num(Max(EffDate)) As MaxDate
From Store.qvd (qvd);
Let vMaxDate = Peek('MaxDate');
DROP Table T_MaxDate;
// Load historical data from storage qvd
Data:
LOAD *
From Store.qvd (qvd);
// Now loop over the source files to find those not yet loaded
For Each vFile in FileList('<path to qvd files>\*.xml')
Let vBasename = SubField(SubField(vFile, '\', -1), ',', 1);
Let vFileDate = Num(Date#(Left(vBaseName, 10), 'YYYY-MM-DD'));
If vFileDate > vMaxDate Then
Concatenate(Data)
LOAD *,
Date(vFileDate) As EffDate
From [$(vFile )] (......);
End If
Next
// Update the storage qvd
STORE Data Into Store.qvd (qvd);
Modify the storage qvd file name/path and FileList path and XML qualifiers to suit your environment.
Alec,
In your case, as you said you are getting data on every night..
Just making sure, are you getting old data as well every time or its only updated data?
Like today is 21st Sept, so in today night file will you be having 18th Sep or 19 Sept or 20 Sept data?
or it will be of 21st sept only?
May be as simple as below code:
Store the first xml file into qvd, like:
FirstTable:
Load
*
From <FirstXml>;
Store FrstTable into ....Data.qvd(qvd);
Drop table FirstTable;
Final:
Load
*
From ....Data.qvd(qvd);
Concatenate // not needed if all fields are common (which is the case in general)
Load
*
From ...<DailyNew XML>; // this could be automated(the name has not be hard coded) if you keep the new file in a separate folder or take in same name and overwrite
Store Final into .....Data.qvd(qvd);
I assume that you have a load statement that can load the XML files.
Then use something like this
// Get the last loaded date
T_MaxDate:
Load Num(Max(EffDate)) As MaxDate
From Store.qvd (qvd);
Let vMaxDate = Peek('MaxDate');
DROP Table T_MaxDate;
// Load historical data from storage qvd
Data:
LOAD *
From Store.qvd (qvd);
// Now loop over the source files to find those not yet loaded
For Each vFile in FileList('<path to qvd files>\*.xml')
Let vBasename = SubField(SubField(vFile, '\', -1), ',', 1);
Let vFileDate = Num(Date#(Left(vBaseName, 10), 'YYYY-MM-DD'));
If vFileDate > vMaxDate Then
Concatenate(Data)
LOAD *,
Date(vFileDate) As EffDate
From [$(vFile )] (......);
End If
Next
// Update the storage qvd
STORE Data Into Store.qvd (qvd);
Modify the storage qvd file name/path and FileList path and XML qualifiers to suit your environment.
I will have two issues here..
The load statement result in a single table but it has two joins.
the xml files have 14 tables each and I am loading three of them only such as
Table1:
Load A
B
from xml file table xyz;
Join (Table1)
Load A,
C
from xml file table ABC;
Join (Table1)
Load A,
L
from xml file table SS;
Store Table1 into AA.QVD;
also , we receive the files everyday at 8 pm and it has today's date.
finally, how do I handle the old files if I want to include them?
Alec,
Their is no logic if you are having today date's data only in a file beacuse you may required old data as well and on daily basis you need to concatenate the new updated file...
Jonathan,
while having total 5 files and the first one is on 2015-09-16 and it is already loaded to create the initial qvd, I am having some issues with the variables.
the other files are for the dates 17, 18, 19, 20 of September.
VFile shows only one value = C:\2015-09-20-Data.xml.
vBaseName returns 2015-09-20-Data.xml
vFileDate is not created at all.
Any idea what I am doing wrong?
I found the issue with the variable creation. it is minor naming issue...
the only thing I need help with now is how to handle the join
Table1:
Load A
B
from xml file table xyz;
Join (Table1)
Load A,
C
from xml file table ABC;
Join (Table1)
Load A,
L
from xml file table SS;
Basename / BaseName -- oops
You can't do a join inside the loop. The join adds data fields and after the first pass the fields exist and they become part of the join key. What you need to do is collect each part in a separate table and join them after the loop. You will also need to concatenate the storage QVD after the join for the same reason.
Like this:
// Get the last loaded date
T_MaxDate:
Load Num(Max(EffDate)) As MaxDate
From Store.qvd (qvd);
Let vMaxDate = Peek('MaxDate');
DROP Table T_MaxDate;
// Now loop over the source files to find those not yet loaded
For Each vFile in FileList('<path to qvd files>\*.xml')
Let vBasename = SubField(SubField(vFile, '\', -1), ',', 1);
Let vFileDate = Num(Date#(Left(vBasename, 10), 'YYYY-MM-DD'));
If vFileDate > vMaxDate Then
Data:
LOAD *,
Date(vFileDate) As EffDate
From [$(vFile )] (......table is 'xyz');
Data2:
LOAD *,
Date(vFileDate) As EffDate
From [$(vFile )] (......table is 'abc');
Data3:
LOAD *,
Date(vFileDate) As EffDate
From [$(vFile )] (......table is 'def');
End If
Next
Left join (Data)
LOAD * Resident Data2;
Left join (Data)
LOAD * Resident Data3;
Concatenate(Data)
LOAD *
From Store.qvd (qvd);
STORE Data Into Store.qvd (qvd);
And this script will load any newer not-yet-loaded files, even if the script does not run for a few days for some reason.