Solved: Incremental load? - Qlik Community

terezagr · ‎2015-08-19

I am trying to load incrementally data. Problem that I have with this approach is that 2 files can have data for two same dates.

E. g. File1 will have data from 22-7-2015 till 29-07-2013(inclusive), File 2 will have data for 29-07-2013 till 05-08-2015.

Sale will be then in both of these files for the date 29-07-2013 = 30.

With my following approach I am duplicating the data for 29-07-2015.

What I want to achieve is only to load data once if 2 files have same data for the same date.

My code is below:

QVDFile:

LOAD

[Sale Date],

Branch,

Sale,

FileName() as FileName

FROM

[Path\6_R.csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where Branch<>'Total';

STORE QVDFile INTO Path\QVDFile.QVD(QVD);

MaxDate:

LOAD Max([Sale Date]) as MaxDate

Resident QVDFile;

LET vMaxDate = Peek('MaxDate');

DROP Table MaxDate;

FOR each File in filelist ('Path\*.csv')

Concatenate (QVDFile)

New:

LOAD [Sale Date],

Branch,

Sale,

'$(File)' as FileName

From

$(File)

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where Branch<>'Total'

and num([Sale Date]) > $(vMaxDate);

NEXT;

STORE QVDFile into Path\QVD.QVD(QVD);

jonathandienst · ‎2015-08-19

You can track the loaded dates in a separate field and table to void the Not(exists()) problem. See the added lines in bold:

QVDFile:

LOAD

[Sale Date],

Branch,

Sale,

FileName() as FileName

FROM

[Path\6_R.csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where Branch<>'Total';

LoadedDate:

LOAD Distinct [Sale Date] As LoadedDate

Resident QVDFile;

MaxDate:

LOAD Max([Sale Date]) as MaxDate

Resident QVDFile;

LET vMaxDate = Peek('MaxDate');

DROP Table MaxDate;

FOR each File in filelist ('Path\*.csv')

Concatenate (QVDFile)

New:

LOAD [Sale Date],

Branch,

Sale,

'$(File)' as FileName

From

$(File)

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where

Branch<>'Total'

and num([Sale Date]) > $(vMaxDate)

and not(Exists(LoadedDate, [Sale Date]))

;

Concatenate (LoadedDate)

LOAD Distinct [Sale Date] As LoadedDate

Resident QVDFile;

DROP Table LoadedDate;

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

View solution in original post

richnorris · ‎2015-08-19

I think you just need to add 'AND WHERE NOT EXISTS([SALE DATE])' at the end of your load statement, so that if the date already exists in your table it isn't loaded again.

kuczynska · ‎2015-08-19

I was planning to suggest the same - if you have SaleID in your data set, you can use that too. I intend to use something like transaction/sale identifier in the incremental loads.

terezagr · ‎2015-08-19

I have tried where not exists(), but it does not do what I would expect to do. If I add where not exists, this will load only unique records from the second load.

However in my files I can have data as follow:

[Sale Date] Branch Sale

29-07-2015 1 2

29-07-2015 2 3

jonathandienst · ‎2015-08-19

Is there any form of ID that identifies the sales in these source files? If there is only line per date, than that will do, but I expect that you have multiple lines per date in each file.

Or can we exclude a date in the second file if it was loaded with the first file?

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

terezagr · ‎2015-08-19

You are right, I have multiple lines per date in each file. I can create a unique ID for sale, although when used with Not Exists, the result is that it gives me only records from the first load.

jonathandienst · ‎2015-08-19

You can track the loaded dates in a separate field and table to void the Not(exists()) problem. See the added lines in bold:

QVDFile:

LOAD

[Sale Date],

Branch,

Sale,

FileName() as FileName

FROM

[Path\6_R.csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where Branch<>'Total';

LoadedDate:

LOAD Distinct [Sale Date] As LoadedDate

Resident QVDFile;

MaxDate:

LOAD Max([Sale Date]) as MaxDate

Resident QVDFile;

LET vMaxDate = Peek('MaxDate');

DROP Table MaxDate;

FOR each File in filelist ('Path\*.csv')

Concatenate (QVDFile)

New:

LOAD [Sale Date],

Branch,

Sale,

'$(File)' as FileName

From

$(File)

(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)

Where

Branch<>'Total'

and num([Sale Date]) > $(vMaxDate)

and not(Exists(LoadedDate, [Sale Date]))

;

Concatenate (LoadedDate)

LOAD Distinct [Sale Date] As LoadedDate

Resident QVDFile;

DROP Table LoadedDate;

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

terezagr · ‎2015-08-19

This works great! Brilliant, thank you Jonathan!

Incremental load?

Related Topics