Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Load only new data from multiple csv

Hello,

I have a QV document that loads data from multiple csv documents ( From *.csv).

New files are being saved in the data folder quite frequently and the load is getting excessively long. The problem I can see is that each of the CSVs is being reloaded from scratch.

Is there a way to specify that only newly added csv files are to be loaded, with the existing data retained?

Regards
Marek

16 Replies
stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

I've decided to take my thoughts on this thread further with a post on my blog.  Here I explore different ways of looking at dealing with changes to source files and further optimisation of the load.

You can read the post here: http://www.quickintelligence.co.uk/convert-drop-folder-files-qvd/

- Steve

aj0031724
Partner - Creator
Partner - Creator

Dear Steve,

I was woring with this logic fo rincreental new file loadind only.

However if I keep on adding new qvd for each csv file then it  continuously just fill up the disk space .

Is there any way by which I just ensure that I am loading only the new file without having to keep the qvd file along woth csv file ?

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

So, is it the case that you have a number of CSVs and you only want to load the latest?

If so, set up a loop like this:

let vFileName = '';

let vLatest = makedate(1984,1,1);

for each vFile in FileList('$(filepath)*.csv')

    if FileTime('$(vFile)') > vLatest then

        let vLatest = FileTime('$(vFile)');

       let vFileName = vFile;

    end if

next

LOAD

   *

FROM $(vFileName)

[... file definition here ...]

;

Hope that helps.

Steve

aj0031724
Partner - Creator
Partner - Creator

Dear Steve,

Thanks.

I tried this logic but it seems it is always picking the last file(only one file ) always not more than that .

Can you please help?

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Avinash,

I thought that was what you were requiring, sorry.  To load from all files, without going via QVD simply move the CSV load inside the loop, and lose all the date checking:

for each vFile in FileList('$(filepath)*.csv')

    LOAD

        *

    FROM $(vFileName)

    [... file definition here ...]

    ;

next

You will find though that the solution in my blog post (above), where you create QVDs for new files and load data from QVD for old files will be much much quicker over time.

If you have the data in QVD you can purge the CSV, and the QVD files should be considerably smaller than the original CSVs (unless the CSVs are very small as QVDs have a slight overhead in the header).  That purge can be done with a CMD statement in the load, but I would recommend a manual clean up process rather than making your QV app need system rights.

Hope that helps,

Steve

aj0031724
Partner - Creator
Partner - Creator

Dear Steve,

Apologies If I did not mak emyself clear enough.

What I want to achieve is :

a)LOAD ONLY NEW FILES FROM THE SAME FOLDER .do not load files which are already loaded.

if today  there are 10 files in the folder which are already loaded and 5 files are loaded now then I just nned to read from these 5 new files .

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Sounds like you need a PARTIAL LOAD.  I tend to avoid them as things can go a bit wrong if you need to reboot.  You should be able to Google for information though.

You will need the PARTIAL syntax, and the code I posted to just get the new stuff.  You will need to persist the date of the last loaded file, to bring in all files since that date, rather than all.

Hope that helps.

Steve