Solved: Re: Incremental multiple csv files load - Qlik Community

nandhaadjame200 · ‎2018-10-26

Hi There

Please help to achieve the below,

I need to load multiple .CSV files on Incremental basis and folder location is same but file name will differ by date suffix and based on the new or updated record in csv files should updated into base qvd (Insert New record, Update existing record). Next time if a new file comes it should be iterated.

Please help to provide the script.

Thanks

Nandhakumar

dplr-rn · ‎2018-10-31

Done. See attached script

Basically switched around below section and added a not exists.Logic - you load the new files and then you load the qvd but dont include any updated records

FOR EACH File in FieldValueList('NewFile')

MyTable:

LOAD *,Date(Right(Left(FileBaseName(),37),10),'MM/DD/YYYY') as "UpdateDate"

FROM [$(File)] (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

next;

// Concatenate(MyTable) this is not needed as it will auto concatenate

LOAD *

FROM [LIB://Conn\MyTable.qvd] (qvd)

Where not exists(ContactID);

View solution in original post

juraj_misina · ‎2018-10-26

There are quite a few unclear things in your description:

- will all processed files remain in the folder?

- is there any reliable record identification in those files?

- do you only need to replace individual updated records or you need to replace whole contents of a file (e.g. file abc-20181025.csv will be replaced by abc-20181026.csv)?

Juraj

dplr-rn · ‎2018-10-26

As mentioned by Juraj

there are a lot of unknowns here.

Assuming that the csv files all remain in same folder and that you want to do incremental (updates and possibly deletes)

2 steps.

Step 1 - Set up the csv file load (i.e. load only the new files

Basically for the first load (based on a flag) load the file names and data into a qvds

Subsequent loads

- get list of files names

- load old list of file names from qvd

- create another table with latest filenames only

- load data from qvd

- loop through new file names and load which gets autoconcatenated

- drop unnecessary tables (havent done this in the app to prove it works)

- save qvds

see attached script

Step 2 - implement incremental load (update and/or) delete on the latest loads. check below

https://www.analyticsvidhya.com/blog/2014/09/qlikview-incremental-load/

nandhaadjame200 · ‎2018-10-28

Hi Juraj

First, Thanks for your reply. Let Me explain you further.

- will all processed files remain in the folder?

Answer: No, It will be moved to Archive folder

- is there any reliable record identification in those files?

Answer: ContactID & Email address will be unique identification in each files

- do you only need to replace individual updated records or you need to replace whole contents of a file (e.g. file abc-20181025.csv will be replaced by abc-20181026.csv)?

Answer:

1. For a specific file, i need to update & Delete the specific records only from the new files not the entire qvd and other records should remains there.

2.Also in another qvd file i should keep updating the new records only and no delete required there.

I hope i made things clear and i ready to explain further if required.

Thanks again

Nandhakumar

nandhaadjame200 · ‎2018-10-28

Hi Dilip

Thanks for your reply

I have replied to Juraj with more details and can you please help with the scenario which i mentioned above.

Thanks

Nandhakumar

dplr-rn · ‎2018-10-29

How are you identifying if a record has changed and needs updation?

nandhaadjame200 · ‎2018-10-29

Hi Dilip

No identification will be there inside the file but if the record is updated in source and then file with the updated records will be place in the specific folder. We have to consider the file as updates and concatenate the same in QVD.

I hope my answer is clear. Please let me know if any clarifications.

Thanks

Nandhakumar

dplr-rn · ‎2018-10-29

My question was regarding individual records do you have a updated/changed date or flag or something similar.

if yes look at link I shared earlier on incremental load.

If not you may need to use hash functions to check if a row has changed. I.e. always create a hash of the columns concantenated together use that as a check if row has changed

Thank You

Dilip Ranjith

Sent via mobile

dplr-rn · ‎2018-10-29

Expanding on above

Create a hash key from all the fields (or at least all non-key fields). You can check for each row identifiers/keys if the calculated hash has changed. If the hash value is different at least one field has a new value. And you need to update

See below for some additional details

www.qlikfix.com/2014/03/11/hash-functions-collisions/

juraj_misina · ‎2018-10-29

Hi,

this one sounds tricky. You say there's no update indication on the record, so I assume this:

A file xyz_20181026.csv contains following records:

ContactID, Email, Value

1, a@a.a, 10

2, a@a.a, 20

3, b@b.b, 30

4, c@c.c, 40

The next day, record 2 gets updated and record 4 gets deleted, so you'll get a new version of the file xyz_20181027.csv with following content:

ContactID, Email, Value

1, a@a.a, 10

2, d@d.d, 25

3, b@b.b, 30

IF my assumption is correct, then you need something like this (not tested, but you'll get the idea)

NewData:
LOAD
FileBaseName() as SourceFileFull,
SubField(FileBaseName(), '_', 1) as SourceFileKey, //just a file name without date info
SubField(FileBaseName(), '_', 1)&'|'&ContactID&'|'&Email as RecordKey,
*
From [lib://SourceDir/*.csv]
(txt, embedded labels, delimiter is ;);

If(FileSize('lib://SourceDir/HistoryQVD.qvd')>0) then
Concatenate(NewData)
LOAD
*
From [lib://SourceDir/HistoryQVD.qvd](qvd)
Where
Not(Exists(SourceFileKey)
;
EndIf

Store NewData into [lib://SourceDir/HistoryQVD.qvd](qvd);

If only updated records are exported in the file again, then your increment key would be RecordKey field (so you'd use it in he Exists function instead of SourceFileKey field). But, in such case there's no way to find out wich records were deleted.

Hope this helps.

Juraj