Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
terezagr
Partner - Creator III
Partner - Creator III

Incremental load?

I am trying to load incrementally data. Problem that I have with this approach is that 2 files can have data for two same dates.

E. g. File1 will have data from 22-7-2015 till 29-07-2013(inclusive), File 2 will have data for 29-07-2013 till 05-08-2015.

Sale will be then in both of these files for the date 29-07-2013 = 30.

With my following approach I am duplicating the data for 29-07-2015.

What I want to achieve is only to load data once if 2 files have same data for the same date.

My code is below:

QVDFile:

LOAD

     [Sale Date],

     Branch, 

     Sale,

     FileName() as FileName

FROM

[Path\6_R.csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where Branch<>'Total';


STORE QVDFile INTO Path\QVDFile.QVD(QVD);

MaxDate:

LOAD Max([Sale Date]) as MaxDate

Resident QVDFile;

LET vMaxDate = Peek('MaxDate');

DROP Table MaxDate;

FOR each File in filelist ('Path\*.csv')

Concatenate (QVDFile)

New:

LOAD [Sale Date],

     Branch,

     Sale,

     '$(File)' as FileName

From

$(File)

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where Branch<>'Total'

and num([Sale Date]) > $(vMaxDate);


NEXT;


STORE QVDFile into Path\QVD.QVD(QVD);

1 Solution

Accepted Solutions
jonathandienst
Partner - Champion III
Partner - Champion III

You can track the loaded dates in a separate field and table to  void the Not(exists()) problem. See the added lines in bold:

QVDFile:

LOAD

    [Sale Date],

    Branch,

    Sale,

    FileName() as FileName

FROM

[Path\6_R.csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where Branch<>'Total';

LoadedDate:

LOAD Distinct [Sale Date] As LoadedDate

Resident QVDFile;

MaxDate:

LOAD Max([Sale Date]) as MaxDate

Resident QVDFile;

LET vMaxDate = Peek('MaxDate');

DROP Table MaxDate;

FOR each File in filelist ('Path\*.csv')

  Concatenate (QVDFile)

  New:

  LOAD [Sale Date],

  Branch,

  Sale,

  '$(File)' as FileName

  From

  $(File)

  (txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

  Where

  Branch<>'Total'

  and num([Sale Date]) > $(vMaxDate)

  and not(Exists(LoadedDate, [Sale Date]))

  ;

  Concatenate (LoadedDate)

  LOAD Distinct [Sale Date] As LoadedDate

  Resident QVDFile;

NEXT

STORE QVDFile into Path\QVD.QVD(QVD);

DROP Table LoadedDate;

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

View solution in original post

7 Replies
richnorris
Creator II
Creator II

I think you just need to add 'AND WHERE NOT EXISTS([SALE DATE])' at the end of your load statement, so that if the date already exists in your table it isn't loaded again.

kuczynska
Creator III
Creator III

I was planning to suggest the same - if you have SaleID in your data set, you can use that too. I intend to use something like transaction/sale identifier in the incremental loads.

terezagr
Partner - Creator III
Partner - Creator III
Author

I have tried where not exists(), but it does not do what I would expect to do. If I add where not exists, this will load only unique records from the second load.

However in my files I can have data as follow:

[Sale Date]     Branch          Sale

29-07-2015     1                    2

29-07-2015     2                    3

jonathandienst
Partner - Champion III
Partner - Champion III

Is there any form of ID that identifies the sales in these source files? If there is only line per date, than that will do, but I expect that you have multiple lines per date in each file.

Or can we exclude a date in the second file if it was loaded with the first file?

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
terezagr
Partner - Creator III
Partner - Creator III
Author

You are right, I have multiple lines per date in each file. I can create a unique ID for sale, although when used with Not Exists, the result is that it gives me only records from the first load.

jonathandienst
Partner - Champion III
Partner - Champion III

You can track the loaded dates in a separate field and table to  void the Not(exists()) problem. See the added lines in bold:

QVDFile:

LOAD

    [Sale Date],

    Branch,

    Sale,

    FileName() as FileName

FROM

[Path\6_R.csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where Branch<>'Total';

LoadedDate:

LOAD Distinct [Sale Date] As LoadedDate

Resident QVDFile;

MaxDate:

LOAD Max([Sale Date]) as MaxDate

Resident QVDFile;

LET vMaxDate = Peek('MaxDate');

DROP Table MaxDate;

FOR each File in filelist ('Path\*.csv')

  Concatenate (QVDFile)

  New:

  LOAD [Sale Date],

  Branch,

  Sale,

  '$(File)' as FileName

  From

  $(File)

  (txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

  Where

  Branch<>'Total'

  and num([Sale Date]) > $(vMaxDate)

  and not(Exists(LoadedDate, [Sale Date]))

  ;

  Concatenate (LoadedDate)

  LOAD Distinct [Sale Date] As LoadedDate

  Resident QVDFile;

NEXT

STORE QVDFile into Path\QVD.QVD(QVD);

DROP Table LoadedDate;

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
terezagr
Partner - Creator III
Partner - Creator III
Author

This works great! Brilliant, thank you Jonathan!