I am getting a csv file placed into a folder each night from our HR system. They contain employee data and each days file will have the same layout and most of the data in today's file will be the same as what came in yesterday. There can be some new records and there can be changes to some of the fields.
Currently the load script is setup to just add todays records to what was already in there. So each day my qvd file is growing with mostly redundant data.
I need to find out what the best process is in the script to only load new records and updated fields each day instead of adding all of the records.
If you need more information or if I didn't explain this clearly please let me know.
Is there a unique identifier which can help in identifying the existing records? If there is one, you can only load data which doesn't exist using Where not Exist statement. The logic you need is incremental load... Check these links to see how ti will work for you
I don't have direct access to the SQL data and there is no modify date.
The back end system is generating a csv file each day. The file name includes the date created.
So currently the task is setup to reload every csv file in the folder. There are currently 280 of them and eventually there will be over 1000.
There is a unique key (employee id).
Every new csv file contains new hires as well as has a record for each employee that was in the prior days csv file. Many of them are identical. Some might have updates. No real way to tell with the process we have currently except to compare every field and update if a change is found.
Is there a good way to do this when you can't access the data directly and are dealing with a daily csv file?