Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi There
Please help to achieve the below,
I need to load multiple .CSV files on Incremental basis and folder location is same but file name will differ by date suffix and based on the new or updated record in csv files should updated into base qvd (Insert New record, Update existing record). Next time if a new file comes it should be iterated.
Please help to provide the script.
Thanks
Nandhakumar
Done. See attached script
Basically switched around below section and added a not exists.Logic - you load the new files and then you load the qvd but dont include any updated records
FOR EACH File in FieldValueList('NewFile')
MyTable:
LOAD *,Date(Right(Left(FileBaseName(),37),10),'MM/DD/YYYY') as "UpdateDate"
FROM [$(File)] (txt, codepage is 28591, embedded labels, delimiter is ',', msq);
next;
// Concatenate(MyTable) this is not needed as it will auto concatenate
LOAD *
FROM [LIB://Conn\MyTable.qvd] (qvd)
Where not exists(ContactID);
There are quite a few unclear things in your description:
- will all processed files remain in the folder?
- is there any reliable record identification in those files?
- do you only need to replace individual updated records or you need to replace whole contents of a file (e.g. file abc-20181025.csv will be replaced by abc-20181026.csv)?
Juraj
As mentioned by Juraj
there are a lot of unknowns here.
Assuming that the csv files all remain in same folder and that you want to do incremental (updates and possibly deletes)
2 steps.
Step 1 - Set up the csv file load (i.e. load only the new files
Basically for the first load (based on a flag) load the file names and data into a qvds
Subsequent loads
- get list of files names
- load old list of file names from qvd
- create another table with latest filenames only
- load data from qvd
- loop through new file names and load which gets autoconcatenated
- drop unnecessary tables (havent done this in the app to prove it works)
- save qvds
see attached script
Step 2 - implement incremental load (update and/or) delete on the latest loads. check below
https://www.analyticsvidhya.com/blog/2014/09/qlikview-incremental-load/
Hi Juraj
First, Thanks for your reply. Let Me explain you further.
- will all processed files remain in the folder?
Answer: No, It will be moved to Archive folder
- is there any reliable record identification in those files?
Answer: ContactID & Email address will be unique identification in each files
- do you only need to replace individual updated records or you need to replace whole contents of a file (e.g. file abc-20181025.csv will be replaced by abc-20181026.csv)?
Answer:
1. For a specific file, i need to update & Delete the specific records only from the new files not the entire qvd and other records should remains there.
2.Also in another qvd file i should keep updating the new records only and no delete required there.
I hope i made things clear and i ready to explain further if required.
Thanks again
Nandhakumar
Hi Dilip
Thanks for your reply
I have replied to Juraj with more details and can you please help with the scenario which i mentioned above.
Thanks
Nandhakumar
How are you identifying if a record has changed and needs updation?
Hi Dilip
No identification will be there inside the file but if the record is updated in source and then file with the updated records will be place in the specific folder. We have to consider the file as updates and concatenate the same in QVD.
I hope my answer is clear. Please let me know if any clarifications.
Thanks
Nandhakumar
My question was regarding individual records do you have a updated/changed date or flag or something similar.
if yes look at link I shared earlier on incremental load.
If not you may need to use hash functions to check if a row has changed. I.e. always create a hash of the columns concantenated together use that as a check if row has changed
Thank You
Dilip Ranjith
Sent via mobile
Expanding on above
Create a hash key from all the fields (or at least all non-key fields). You can check for each row identifiers/keys if the calculated hash has changed. If the hash value is different at least one field has a new value. And you need to update
See below for some additional details
www.qlikfix.com/2014/03/11/hash-functions-collisions/
Hi,
this one sounds tricky. You say there's no update indication on the record, so I assume this:
A file xyz_20181026.csv contains following records:
ContactID, Email, Value
1, a@a.a, 10
2, a@a.a, 20
3, b@b.b, 30
4, c@c.c, 40
The next day, record 2 gets updated and record 4 gets deleted, so you'll get a new version of the file xyz_20181027.csv with following content:
ContactID, Email, Value
1, a@a.a, 10
2, d@d.d, 25
3, b@b.b, 30
IF my assumption is correct, then you need something like this (not tested, but you'll get the idea)
NewData:
LOAD
FileBaseName() as SourceFileFull,
SubField(FileBaseName(), '_', 1) as SourceFileKey, //just a file name without date info
SubField(FileBaseName(), '_', 1)&'|'&ContactID&'|'&Email as RecordKey,
*
From [lib://SourceDir/*.csv]
(txt, embedded labels, delimiter is ;);
If(FileSize('lib://SourceDir/HistoryQVD.qvd')>0) then
Concatenate(NewData)
LOAD
*
From [lib://SourceDir/HistoryQVD.qvd](qvd)
Where
Not(Exists(SourceFileKey)
;
EndIf
Store NewData into [lib://SourceDir/HistoryQVD.qvd](qvd);
If only updated records are exported in the file again, then your increment key would be RecordKey field (so you'd use it in he Exists function instead of SourceFileKey field). But, in such case there's no way to find out wich records were deleted.
Hope this helps.
Juraj