Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am getting a csv file placed into a folder each night from our HR system. They contain employee data and each days file will have the same layout and most of the data in today's file will be the same as what came in yesterday. There can be some new records and there can be changes to some of the fields.
Currently the load script is setup to just add todays records to what was already in there. So each day my qvd file is growing with mostly redundant data.
I need to find out what the best process is in the script to only load new records and updated fields each day instead of adding all of the records.
If you need more information or if I didn't explain this clearly please let me know.
Is there a unique identifier which can help in identifying the existing records? If there is one, you can only load data which doesn't exist using Where not Exist statement. The logic you need is incremental load... Check these links to see how ti will work for you
https://www.resultdata.com/three-types-of-qlikview-incremental-loads/
Incremental Load in QlikView – Part1 – Learn QlikView
https://www.analyticsvidhya.com/blog/2014/09/qlikview-incremental-load/
Hi,
For doing this you need the following
- Any date field to identify that the record is modified (something like Modified/Updated Date field)
- Need a Primary key to identify each record
- Need to identify the data pattern
-Are we deleting any records, or just updating and inserting new records.
You can check for the sample script and examples in Qlikview help file.
Check for this topic
You will get very good explanation for all the scenarios.
Regards,
Jagan.
I don't have direct access to the SQL data and there is no modify date.
The back end system is generating a csv file each day. The file name includes the date created.
So currently the task is setup to reload every csv file in the folder. There are currently 280 of them and eventually there will be over 1000.
There is a unique key (employee id).
Every new csv file contains new hires as well as has a record for each employee that was in the prior days csv file. Many of them are identical. Some might have updates. No real way to tell with the process we have currently except to compare every field and update if a change is found.
Is there a good way to do this when you can't access the data directly and are dealing with a daily csv file?
Well assuming that the new file's employee id will have all the updated information I would delete any occurrence of existing employee id
NewFile:
LOAD empID,
...
FROM NewEmpId;
ExistingData:
LOAD empID,
...
FROM ExistingQVD
Where not Exists(empID);
Concatenate (ExistingData)
LOAD *
Resident NewFile;