Solved: Re: Load only new data from multiple csv - Qlik Community

Report Inappropriate Content · ‎2014-02-27

Hello,

I have a QV document that loads data from multiple csv documents ( From *.csv).

New files are being saved in the data folder quite frequently and the load is getting excessively long. The problem I can see is that each of the CSVs is being reloaded from scratch.

Is there a way to specify that only newly added csv files are to be loaded, with the existing data retained?

Regards
Marek

stevedark · ‎2014-02-27

Hi Marek,

I would create a QVD file for each CSV. You can do this by looping for each CSV file you have, and just creating QVD's where this doesn't already exist:

for each vFile in FileList('*.csv')

let vQVD = replace(vFile, '.csv', '.qvd');

if alt(FileSize('$(vQVD)'), 0) = 0 then

NextFile:

LOAD

*

FROM $(vFile)

([... CSV format in here ...])

;

STORE NextFile INTO $(vQVD) (qvd);

DROP TABLE NextFile;

end if

next

You can then load from *.qvd and this will go much much quicker than loading from the CSV files.

I've just typed the code straight into Community, so there may be slight syntax errors, by the idea behind it should be sound.

Hope that helps,

Steve

View solution in original post

MK_QSL · ‎2014-02-27

Use While Exists during load with some primary key...

Possibly date...

rustyfishbones · ‎2014-02-27

Hi Marek,

Try storing the old .csv files into a qvd file

It will reload much faster

Report Inappropriate Content · ‎2014-02-27

Marek,

As Alan said, see the QVD files.

You may also use the prefix BUFFER :

BUFFER LOAD ....

This will create the QVD from the existing data. Next time, the QVD will be read and only new lines will be read from csv.

The problem with this technique is that it does not detect the changed lines : you will always get old lines from QVD and new lines from csv.

It is why you certainly have to manage by yourself the QVD procedure.

Fabrice

stevedark · ‎2014-02-27

Hi Marek,

I would create a QVD file for each CSV. You can do this by looping for each CSV file you have, and just creating QVD's where this doesn't already exist:

for each vFile in FileList('*.csv')

let vQVD = replace(vFile, '.csv', '.qvd');

if alt(FileSize('$(vQVD)'), 0) = 0 then

NextFile:

LOAD

*

FROM $(vFile)

([... CSV format in here ...])

;

STORE NextFile INTO $(vQVD) (qvd);

DROP TABLE NextFile;

end if

next

You can then load from *.qvd and this will go much much quicker than loading from the CSV files.

I've just typed the code straight into Community, so there may be slight syntax errors, by the idea behind it should be sound.

Hope that helps,

Steve

stevedark · ‎2014-02-27

You may also want to take a look at this post on incremental loads:

http://www.quickintelligence.co.uk/qlikview-incremental-load/

marcus_malinow · ‎2014-02-27

Another solution might be to use the filename() statement when loading.

so

LOAD *,

filename() as CSVFileName

from [your CSV].

Store this into your QVD, then when you do a reload, load the distinct values of CSVFileName from your QVD and skip those CSV's. Then concatenate the data from your QVD, and resave it.

Marcus

Report Inappropriate Content · ‎2014-02-27

Steve,

Great solution: easy to understand, to implement and certainly very efficient.

I hope for Marek that the csv (name, number) are not versatile.

Fabrice

Report Inappropriate Content · ‎2014-02-28

Hi,

Thank you all for all the help, Steve's advice was exactly what I was after !

Regards

Marek

stevedark · ‎2014-02-28

Hi Marek,

Glad that this has solved your issue.

To respond to Marcus Malinow on his approach, this could be a bit quicker as the final load would be from a single QVD rather than many. What I like about my approach though (and why I have used a similar approach on a few occasions) is that if you know that a CSV has been modified you can simply delete the corresponding QVD and it will be recreated. Indeed you can delete all of the QVD's and the whole lot will be recreated from scratch.

It would be possible to put more logic into the load, eg. check the date on the CSV file and rebuild the corresponding QVD if the CSV had changed since the last load. The code above is a bare bones approach though.

When benchmarking the above solution I was amazed at how quick the FileSize function was - many files could be skipped with very little lag at all.

Steve