Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
nandhaadjame200
Contributor II
Contributor II

Incremental multiple csv files load

Hi There

Please help to achieve the below,

I need to load multiple .CSV files on Incremental basis and folder location is same but file name will differ by date suffix and based on the new or updated record in csv files should updated into base qvd (Insert New record, Update existing record). Next time if a new file comes it should be iterated.

Please help to provide the script.

Thanks

Nandhakumar

1 Solution

Accepted Solutions
dplr-rn
Partner - Master III
Partner - Master III

Done. See attached script

Basically switched around below section and added a not exists.Logic - you load the new files and then you load the qvd but dont include any updated records

FOR EACH File in FieldValueList('NewFile')

        MyTable:

        LOAD *,Date(Right(Left(FileBaseName(),37),10),'MM/DD/YYYY') as "UpdateDate"

FROM [$(File)] (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

    next;

   

   // Concatenate(MyTable) this is not needed as it will auto concatenate

    LOAD *

FROM [LIB://Conn\MyTable.qvd] (qvd)

    Where not exists(ContactID);

View solution in original post

23 Replies
juraj_misina
Luminary Alumni
Luminary Alumni

There are quite a few unclear things in your description:

- will all processed files remain in the folder?

- is there any reliable record identification in those files?

- do you only need to replace individual updated records or you need to replace whole contents of a file (e.g. file abc-20181025.csv will be replaced by abc-20181026.csv)?

Juraj

dplr-rn
Partner - Master III
Partner - Master III

As mentioned by Juraj

there are a lot of unknowns here.

Assuming that the csv files all remain in same folder and that you want to do incremental (updates and possibly deletes)

2 steps.

Step 1 - Set up the csv file load (i.e. load only the new files

Basically for the first load (based on a flag) load the file names and data into a qvds

Subsequent loads

- get list of files names

- load old list of file names from qvd

- create another table with latest filenames only

- load data from qvd

- loop through new file names and load which gets autoconcatenated

- drop unnecessary tables (havent done this in the app to prove it works)

- save qvds

see attached script


Step 2 - implement incremental load (update and/or) delete on the latest loads. check below

https://www.analyticsvidhya.com/blog/2014/09/qlikview-incremental-load/

nandhaadjame200
Contributor II
Contributor II
Author

Hi Juraj

First, Thanks for your reply. Let Me explain you further.

- will all processed files remain in the folder?

Answer: No, It will be moved to Archive folder

- is there any reliable record identification in those files?

Answer: ContactID & Email address will be unique identification in each files

- do you only need to replace individual updated records or you need to replace whole contents of a file (e.g. file abc-20181025.csv will be replaced by abc-20181026.csv)?

Answer:

1. For a specific file, i need to update & Delete the specific records only from the new files not the entire qvd and other records should remains there.

2.Also in another qvd file i should keep updating the new records only and no delete required there.

I hope i made things clear and i ready to explain further if required.

Thanks again

Nandhakumar

nandhaadjame200
Contributor II
Contributor II
Author

Hi Dilip

Thanks for your reply

I have replied to Juraj with more details and can you please help with the scenario which i mentioned above.

Thanks

Nandhakumar

dplr-rn
Partner - Master III
Partner - Master III

How are you identifying if a record has changed and needs updation?

nandhaadjame200
Contributor II
Contributor II
Author

Hi Dilip

No identification will be there inside the file but if the record is updated in source and then file with the updated records will be place in the specific folder. We have to consider the file as updates and concatenate the same in QVD.

I hope my answer is clear. Please let me know if any clarifications.

Thanks

Nandhakumar

dplr-rn
Partner - Master III
Partner - Master III

My question was regarding individual records do you have a updated/changed date or flag or something similar.

if yes look at link I shared earlier on incremental load.

If not you may need to use hash functions to check if a row has changed. I.e. always create a hash of the columns concantenated together use that as a check if row has changed

Thank You

Dilip Ranjith

Sent via mobile

dplr-rn
Partner - Master III
Partner - Master III

Expanding on above

Create a hash key from all the fields (or at least all non-key fields). You can check for each row identifiers/keys if the calculated hash has changed. If the hash value is different at least one field has a new value. And you need to update

See below for some additional details

www.qlikfix.com/2014/03/11/hash-functions-collisions/

juraj_misina
Luminary Alumni
Luminary Alumni

Hi,

this one sounds tricky. You say there's no update indication on the record, so I assume this:

A file xyz_20181026.csv contains following records:

ContactID, Email, Value

1, a@a.a, 10

2, a@a.a, 20

3, b@b.b, 30

4, c@c.c, 40

The next day, record 2 gets updated and record 4 gets deleted, so you'll get a new version of the file xyz_20181027.csv with following content:

ContactID, Email, Value

1, a@a.a, 10

2, d@d.d, 25

3, b@b.b, 30

IF my assumption is correct, then you need something like this (not tested, but you'll get the idea)

NewData:

LOAD

FileBaseName() as SourceFileFull,

SubField(FileBaseName(), '_', 1) as SourceFileKey, //just a file name without date info

SubField(FileBaseName(), '_', 1)&'|'&ContactID&'|'&Email as RecordKey,

*

From [lib://SourceDir/*.csv]

(txt, embedded labels, delimiter is ;);

If(FileSize('lib://SourceDir/HistoryQVD.qvd')>0) then

Concatenate(NewData)

LOAD

*

From [lib://SourceDir/HistoryQVD.qvd](qvd)

Where

Not(Exists(SourceFileKey)

;

EndIf

Store NewData into [lib://SourceDir/HistoryQVD.qvd](qvd);

If only updated records are exported in the file again, then your increment key would be RecordKey field (so you'd use it in he Exists function instead of SourceFileKey field). But, in such case there's no way to find out wich records were deleted.

Hope this helps.

Juraj