Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am trying to load incrementally data. Problem that I have with this approach is that 2 files can have data for two same dates.
E. g. File1 will have data from 22-7-2015 till 29-07-2013(inclusive), File 2 will have data for 29-07-2013 till 05-08-2015.
Sale will be then in both of these files for the date 29-07-2013 = 30.
With my following approach I am duplicating the data for 29-07-2015.
What I want to achieve is only to load data once if 2 files have same data for the same date.
My code is below:
QVDFile:
LOAD
[Sale Date],
Branch,
Sale,
FileName() as FileName
FROM
[Path\6_R.csv]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)
Where Branch<>'Total';
STORE QVDFile INTO Path\QVDFile.QVD(QVD);
MaxDate:
LOAD Max([Sale Date]) as MaxDate
Resident QVDFile;
LET vMaxDate = Peek('MaxDate');
DROP Table MaxDate;
FOR each File in filelist ('Path\*.csv')
Concatenate (QVDFile)
New:
LOAD [Sale Date],
Branch,
Sale,
'$(File)' as FileName
From
$(File)
(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)
Where Branch<>'Total'
and num([Sale Date]) > $(vMaxDate);
NEXT;
STORE QVDFile into Path\QVD.QVD(QVD);
You can track the loaded dates in a separate field and table to void the Not(exists()) problem. See the added lines in bold:
QVDFile:
LOAD
[Sale Date],
Branch,
Sale,
FileName() as FileName
FROM
[Path\6_R.csv]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)
Where Branch<>'Total';
LoadedDate:
LOAD Distinct [Sale Date] As LoadedDate
Resident QVDFile;
MaxDate:
LOAD Max([Sale Date]) as MaxDate
Resident QVDFile;
LET vMaxDate = Peek('MaxDate');
DROP Table MaxDate;
FOR each File in filelist ('Path\*.csv')
Concatenate (QVDFile)
New:
LOAD [Sale Date],
Branch,
Sale,
'$(File)' as FileName
From
$(File)
(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)
Where
Branch<>'Total'
and num([Sale Date]) > $(vMaxDate)
and not(Exists(LoadedDate, [Sale Date]))
;
Concatenate (LoadedDate)
LOAD Distinct [Sale Date] As LoadedDate
Resident QVDFile;
NEXT
STORE QVDFile into Path\QVD.QVD(QVD);
DROP Table LoadedDate;
I think you just need to add 'AND WHERE NOT EXISTS([SALE DATE])' at the end of your load statement, so that if the date already exists in your table it isn't loaded again.
I was planning to suggest the same - if you have SaleID in your data set, you can use that too. I intend to use something like transaction/sale identifier in the incremental loads.
I have tried where not exists(), but it does not do what I would expect to do. If I add where not exists, this will load only unique records from the second load.
However in my files I can have data as follow:
[Sale Date] Branch Sale
29-07-2015 1 2
29-07-2015 2 3
Is there any form of ID that identifies the sales in these source files? If there is only line per date, than that will do, but I expect that you have multiple lines per date in each file.
Or can we exclude a date in the second file if it was loaded with the first file?
You are right, I have multiple lines per date in each file. I can create a unique ID for sale, although when used with Not Exists, the result is that it gives me only records from the first load.
You can track the loaded dates in a separate field and table to void the Not(exists()) problem. See the added lines in bold:
QVDFile:
LOAD
[Sale Date],
Branch,
Sale,
FileName() as FileName
FROM
[Path\6_R.csv]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)
Where Branch<>'Total';
LoadedDate:
LOAD Distinct [Sale Date] As LoadedDate
Resident QVDFile;
MaxDate:
LOAD Max([Sale Date]) as MaxDate
Resident QVDFile;
LET vMaxDate = Peek('MaxDate');
DROP Table MaxDate;
FOR each File in filelist ('Path\*.csv')
Concatenate (QVDFile)
New:
LOAD [Sale Date],
Branch,
Sale,
'$(File)' as FileName
From
$(File)
(txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 2 lines)
Where
Branch<>'Total'
and num([Sale Date]) > $(vMaxDate)
and not(Exists(LoadedDate, [Sale Date]))
;
Concatenate (LoadedDate)
LOAD Distinct [Sale Date] As LoadedDate
Resident QVDFile;
NEXT
STORE QVDFile into Path\QVD.QVD(QVD);
DROP Table LoadedDate;
This works great! Brilliant, thank you Jonathan!