Qlik Community

New to Qlik Sense

Discussion board where members can get started with Qlik Sense.

rittermd
Honored Contributor

Load Question

I am getting a csv file placed into a folder each night from our HR system.  They contain employee data and each days file will have the same layout and most of the data in today's file will be the same as what came in yesterday.  There can be some new records and there can be changes to some of the fields. 

Currently the load script is setup to just add todays records to what was already in there.  So each day my qvd file is growing with mostly redundant data.

I need to find out what the best process is in the script to only load new records and updated fields each day instead of adding all of the records.

If you need more information or if I didn't explain this clearly please let me know.

4 Replies
MVP
MVP

Re: Load Question

Is there a unique identifier which can help in identifying the existing records? If there is one, you can only load data which doesn't exist using Where not Exist statement. The logic you need is incremental load... Check these links to see how ti will work for you

https://www.resultdata.com/three-types-of-qlikview-incremental-loads/

Incremental Load in QlikView – Part1 – Learn QlikView

https://www.analyticsvidhya.com/blog/2014/09/qlikview-incremental-load/

Highlighted
MVP
MVP

Re: Load Question

Hi,

For doing this you need the following

- Any date field to identify that the record is modified (something like Modified/Updated Date field)

- Need a Primary key to identify each record

- Need to identify the data pattern

     -Are we deleting any records, or just updating and inserting new records.

You can check for the sample script and examples in Qlikview help file.

Check for this topic

Using QVD Files for Incremental Load

You will get very good explanation for all the scenarios.

Regards,

Jagan.

rittermd
Honored Contributor

Re: Load Question

I don't have direct access to the SQL data and there is no modify date.

The back end system is generating a csv file each day.  The file name includes the date created. 

So currently the task is setup to reload every csv file in the folder. There are currently 280 of them and eventually there will be over 1000.

There is a unique key (employee id).

Every new csv file contains new hires as well as has a record for each employee that was in the prior days csv file.  Many of them are identical.  Some might have updates.  No real way to tell with the process we have currently except to compare every field and update if a change is found. 

Is there a good way to do this when you can't access the data directly and are dealing with a daily csv file?

MVP
MVP

Re: Load Question

Well assuming that the new file's employee id will have all the updated information I would delete any occurrence of existing employee id

NewFile:

LOAD empID,

           ...

FROM NewEmpId;

ExistingData:

LOAD empID,

           ...

FROM ExistingQVD

Where not Exists(empID);

Concatenate (ExistingData)

LOAD *

Resident NewFile;