Qlik Community

QlikView App Development

Discussion Board for collaboration related to QlikView App Development.

Highlighted
Not applicable

Load only new data from multiple csv

Hello,

I have a QV document that loads data from multiple csv documents ( From *.csv).

New files are being saved in the data folder quite frequently and the load is getting excessively long. The problem I can see is that each of the CSVs is being reloaded from scratch.

Is there a way to specify that only newly added csv files are to be loaded, with the existing data retained?

Regards
Marek

1 Solution

Accepted Solutions
MVP & Luminary
MVP & Luminary

Re: Load only new data from multiple csv

Hi Marek,

I would create a QVD file for each CSV.  You can do this by looping for each CSV file you have, and just creating QVD's where this doesn't already exist:

for each vFile in FileList('*.csv')

     let vQVD = replace(vFile, '.csv', '.qvd');


     if alt(FileSize('$(vQVD)'), 0) = 0 then

          NextFile:

          LOAD

              *

          FROM $(vFile)

          ([... CSV format in here ...])

          ;


          STORE NextFile INTO $(vQVD) (qvd);


          DROP TABLE NextFile;

     end if

next

You can then load from *.qvd and this will go much much quicker than loading from the CSV files.

I've just typed the code straight into Community, so there may be slight syntax errors, by the idea behind it should be sound.

Hope that helps,

Steve

View solution in original post

16 Replies
MVP
MVP

Re: Load only new data from multiple csv

Use While Exists during load with some primary key...

Possibly date...

rustyfishbones
Honored Contributor II

Re: Load only new data from multiple csv

Hi Marek,

Try storing the old .csv files into a qvd file

It will reload much faster

Not applicable

Re: Load only new data from multiple csv

Marek,

As Alan said, see the QVD files.

You may also use the prefix BUFFER :

BUFFER LOAD ....

This will create the QVD from the existing data. Next time, the QVD will be read and only new lines will be read from csv.

The problem with this technique is that it does not detect the changed lines : you will always get old lines from QVD and new lines from csv.

It is why you certainly have to manage by yourself the QVD procedure.

Fabrice

MVP & Luminary
MVP & Luminary

Re: Load only new data from multiple csv

Hi Marek,

I would create a QVD file for each CSV.  You can do this by looping for each CSV file you have, and just creating QVD's where this doesn't already exist:

for each vFile in FileList('*.csv')

     let vQVD = replace(vFile, '.csv', '.qvd');


     if alt(FileSize('$(vQVD)'), 0) = 0 then

          NextFile:

          LOAD

              *

          FROM $(vFile)

          ([... CSV format in here ...])

          ;


          STORE NextFile INTO $(vQVD) (qvd);


          DROP TABLE NextFile;

     end if

next

You can then load from *.qvd and this will go much much quicker than loading from the CSV files.

I've just typed the code straight into Community, so there may be slight syntax errors, by the idea behind it should be sound.

Hope that helps,

Steve

View solution in original post

MVP & Luminary
MVP & Luminary

Re: Load only new data from multiple csv

You may also want to take a look at this post on incremental loads:

http://www.quickintelligence.co.uk/qlikview-incremental-load/

marcus_malinow
Valued Contributor III

Re: Load only new data from multiple csv

Another solution might be to use the filename() statement when loading.

so

LOAD *,

     filename() as CSVFileName

from [your CSV].

Store this into your QVD, then when you do a reload, load the distinct values of CSVFileName from your QVD and skip those CSV's. Then concatenate the data from your QVD, and resave it.

Marcus

Not applicable

Re: Load only new data from multiple csv

Steve,

Great solution: easy to understand, to implement and certainly very efficient.

I hope for Marek that the csv (name, number) are not versatile.

Fabrice

Not applicable

Re: Load only new data from multiple csv

Hi,

Thank you all for all the help, Steve's advice was exactly what I was after !

Regards

Marek

MVP & Luminary
MVP & Luminary

Re: Load only new data from multiple csv

Hi Marek,

Glad that this has solved your issue.

To respond to Marcus Malinow on his approach, this could be a bit quicker as the final load would be from a single QVD rather than many.  What I like about my approach though (and why I have used a similar approach on a few occasions) is that if you know that a CSV has been modified you can simply delete the corresponding QVD and it will be recreated.  Indeed you can delete all of the QVD's and the whole lot will be recreated from scratch.

It would be possible to put more logic into the load, eg. check the date on the CSV file and rebuild the corresponding QVD if the CSV had changed since the last load.  The code above is a bare bones approach though.

When benchmarking the above solution I was amazed at how quick the FileSize function was - many files could be skipped with very little lag at all.

Steve