Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
alec1982
Specialist II
Specialist II

Incremental Load

hi all,

Trying to find the quickest way to do incremental load from a set of xml files.. the files have the names include date string such as 2015-09-16 and so on. once the file is received and stored into the directory, it will never change and we receive a new file every night..

sample of files names:

2015-09-10-Data.xml

2015-09-11-Data.xml

2015-09-12-Data.xml

I thought about storing the data into qvd then load next file but not sure how to do that..

any idea?

Thxs,

Alec

1 Solution

Accepted Solutions
jonathandienst
Partner - Champion III
Partner - Champion III

I assume that you have a load statement that can load the XML files.

Then use something like this

// Get the last loaded date

T_MaxDate:

Load Num(Max(EffDate)) As MaxDate

From Store.qvd (qvd);

Let vMaxDate = Peek('MaxDate');

DROP Table T_MaxDate;

// Load historical data from storage qvd

Data:

LOAD *

From Store.qvd (qvd);

// Now loop over the source files to find those not yet loaded

For Each vFile in FileList('<path to qvd files>\*.xml')

  Let vBasename = SubField(SubField(vFile, '\', -1), ',', 1);

  Let vFileDate = Num(Date#(Left(vBaseName, 10), 'YYYY-MM-DD'));

  If vFileDate > vMaxDate Then

     Concatenate(Data)

     LOAD *,

     Date(vFileDate) As EffDate

     From [$(vFile )] (......);

   End If

Next

// Update the storage qvd

STORE Data Into Store.qvd (qvd);

Modify the storage qvd file name/path and FileList path and XML qualifiers to suit your environment.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

View solution in original post

10 Replies
Anonymous
Not applicable

Alec,

In your case, as you said you are getting data on every night..

Just making sure, are you getting old data as well every time or its only updated data?

Like today is 21st Sept, so in today night file will you be having 18th Sep or 19 Sept or 20 Sept data?

or it will be of 21st sept only?

tresesco
MVP
MVP

May be as simple as below code:

Store the first xml file into qvd, like:

FirstTable:

Load

          *

From <FirstXml>;

Store FrstTable into ....Data.qvd(qvd);

Drop table FirstTable;

Final:

Load

          *

From  ....Data.qvd(qvd);

Concatenate                         // not needed if all fields are common (which is the case in general)

Load

          *

From  ...<DailyNew XML>;   // this could be automated(the name has not be hard coded) if you keep the new file in a separate folder or take in same name and overwrite

Store Final into .....Data.qvd(qvd);

jonathandienst
Partner - Champion III
Partner - Champion III

I assume that you have a load statement that can load the XML files.

Then use something like this

// Get the last loaded date

T_MaxDate:

Load Num(Max(EffDate)) As MaxDate

From Store.qvd (qvd);

Let vMaxDate = Peek('MaxDate');

DROP Table T_MaxDate;

// Load historical data from storage qvd

Data:

LOAD *

From Store.qvd (qvd);

// Now loop over the source files to find those not yet loaded

For Each vFile in FileList('<path to qvd files>\*.xml')

  Let vBasename = SubField(SubField(vFile, '\', -1), ',', 1);

  Let vFileDate = Num(Date#(Left(vBaseName, 10), 'YYYY-MM-DD'));

  If vFileDate > vMaxDate Then

     Concatenate(Data)

     LOAD *,

     Date(vFileDate) As EffDate

     From [$(vFile )] (......);

   End If

Next

// Update the storage qvd

STORE Data Into Store.qvd (qvd);

Modify the storage qvd file name/path and FileList path and XML qualifiers to suit your environment.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
alec1982
Specialist II
Specialist II
Author

I will have two issues here..

The load statement result in a single table but it has two joins.

the xml files have 14 tables each and I am loading three of them only  such as

Table1:

Load A

B

from xml file table xyz;

Join (Table1)

Load A,

C

from xml file table ABC;

Join (Table1)

Load A,

L

from xml file table SS;

Store Table1 into AA.QVD;

also , we receive the files everyday at 8 pm and it has today's date.

finally, how do I handle the old files if I want to include them?

Anonymous
Not applicable

Alec,

Their is no logic if you are having today date's data only in a file beacuse you may required old data as well and on daily basis you need to concatenate the new updated file...

alec1982
Specialist II
Specialist II
Author

Jonathan,

while having total 5 files and the first one is on 2015-09-16 and it is already loaded to create the initial qvd, I am having some issues with the variables.

the other files are for the dates 17, 18, 19, 20 of September.

VFile shows only one value = C:\2015-09-20-Data.xml.

vBaseName returns 2015-09-20-Data.xml

vFileDate is not created at all.

Any idea what I am doing wrong?

alec1982
Specialist II
Specialist II
Author

I found the issue with the variable creation. it is minor naming issue...

the only thing I need help with now is how to handle the join

Table1:

Load A

B

from xml file table xyz;

Join (Table1)

Load A,

C

from xml file table ABC;

Join (Table1)

Load A,

L

from xml file table SS;

jonathandienst
Partner - Champion III
Partner - Champion III

Basename / BaseName -- oops

You can't do a join inside the loop. The join adds data fields and after the first pass the fields exist and they become part of the join key. What you need to do is collect each part in a separate table and join them after the loop.  You will also need to concatenate the storage QVD after the join for the same reason.

Like this:

// Get the last loaded date

T_MaxDate:

Load Num(Max(EffDate)) As MaxDate

From Store.qvd (qvd);

Let vMaxDate = Peek('MaxDate');

DROP Table T_MaxDate;

// Now loop over the source files to find those not yet loaded

For Each vFile in FileList('<path to qvd files>\*.xml')

  Let vBasename = SubField(SubField(vFile, '\', -1), ',', 1);

  Let vFileDate = Num(Date#(Left(vBasename, 10), 'YYYY-MM-DD'));

  If vFileDate > vMaxDate Then

    

  Data:

     LOAD *,

     Date(vFileDate) As EffDate

     From [$(vFile )] (......table is 'xyz');

  

  Data2:

     LOAD *,

     Date(vFileDate) As EffDate

     From [$(vFile )] (......table is 'abc');

  Data3:

     LOAD *,

     Date(vFileDate) As EffDate

     From [$(vFile )] (......table is 'def');

  

   End If

Next

Left join (Data)

LOAD * Resident Data2;

Left join (Data)

LOAD * Resident Data3;

Concatenate(Data)

LOAD *

From Store.qvd (qvd);

STORE Data Into Store.qvd (qvd);

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
jonathandienst
Partner - Champion III
Partner - Champion III

And this script will load any newer not-yet-loaded files, even if the script does not run for a few days for some reason.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein