Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How to create a QVD which appends itself from a CSV file?

Hello,

For the project I am working on I will be receiving a monthly data dump in CSV format which I will be using in QlikView for analytics. I need to keep a running master file which contains all of the data from each month, so each new file needs to be appended to the bottom of this one.

Currently I created a SAS process to do this, but in order to maximize efficiency as well as reduce dependency on other tools (SAS and it's licenses is neither cheap nor user friendly) I want to start doing this in QV itself. Currently I am not using QVD's but I wish to start doing so.

So my question is what would be the process for creating this master table from a monthly data feed?

Important factors:

1. The monthly feed will always have the same name. We always move the new file into the folder where the previous one is, and overwrite it, then run the SAS process. This way the process always consumes the same file but with different data (so that the files path remains static). We also create a back-up of the monthly file since next month it would be overwritten again.

2. Sometimes we may receive a data file half way through the month because it has modified records for the same time period. So part of the process also would need to check the QVD to see if there are already records from the same month as the import file. If there are, then DELETE those ones currently in the QVD and THEN append the data with the new file which has the new data for that month.

I am thinking it would be something a long the lines of:

1. load the flat file from csv, and perform the required transformations (we are adding a few calculated columns)

2. load the QVD itself and do something like 'WHERE Month NOT IN(SELECT Month FROM imported_csv)"

3. UNION SELECT * FROM Imported_CSV

Does that sound right at all?

2 Replies
Gysbert_Wassenaar

That looks pretty much correct. QV will automatically append loads that contain exactly the same fields into the first table so a union select isn't necessary (though explicitly specifying concatenate(TableX) to make sure the data is appended to the right table is a good idea. The qvd load would be something like

load * from myqvd.qvd (qvd)

where not exists(Month);

See here for some links about incremental reloading and this blog post: http://www.quickintelligence.co.uk/qlikview-incremental-load/


talk is cheap, supply exceeds demand
Not applicable
Author

Hey G,

I am still having trouble with this. I am first loading the csv file with NEW data, and right below that doing a concatenate load of the QVD with the "WHERE NOT EXISTS (Month)" line you recommended. This runs properly and in the load details looks fine. (it says that it loads 900+ records from the csv, and another 900+ from the QVD) however when I create a statistics box and have it count the rows based on an ID field, the count is equal that of only the data from the QVD. The data loaded from the CSV is no where to be found in the table. Does this make sense?

I will try to clarify via example:

For testing purposes, I duplicated my data and changed the date field. So the QVD holds 905 rows stamped with the date 7/1/2013, the CSV has 905 rows stamped with the date 8/1/2013.

Basically, I need to load the csv, and then only load the rows of the QVD which have a *different* date than the csv file (in this case, it would be all of them, I will test that functionality once I can even get the concatenate to work), concatenate the data, and then insert ALL of it into the same QVD.

Or if there is a better way of doing it, let me know. easiest way to put my requirement is:

I need to maintain a QVD which will store historical data, *however* there may be times where I will receive an updated file for a previous time period, when this is the case, all of the data which is already in the QVD for that time period needs to be DELETED and only the new data for that time period should be kept. I am not sure how else to explain this so I hope it's clear.

Let me know if you can help.