Let me 1st explain what setup I have in mind.
Each day we receive new files containing data.
I have a qvw which contains this massive amount of data up to a certain date(Binary.qvw). Obviously reloading this model each day to only capture the new data is going to have to go look at all those files again and just won't do... (a single file contains about 250 000 lines and we have 4 years' worth)
So I have a second qvw(Incremental.qvw) which loads the 1st as binary, then has a incremental load to only load the newly placed files and concatenate it to the binary table loaded from the 1st qvw.
Here is where I am currently stuck... I then wish to save the Incremental.qvw which thanks to the incremental load now has the new data included, AS THE ORIGINAL Binary.qvw to overwrite it. So the next day the Binary.qvw will already have the previous day's data and only the Incremental.qvw would need to be run daily. But nowhere could I find a script command to store the qvw as: so that I could write it in at the end of my script.
I am sure I am missing some key theory here... Yes I could store the Binary.qvw as a qvd but the amount of data makes it difficult when I need to load it again (The qvd is about 15GB)...
Another thing that I thought I could maybe do was the following...
In my reload chain, I 1st run script of Binary.qvw, but in it's script I now load Incremental.qvw as a binary load containing the new data (Which was captured the previous day)! Next I run script for Incremental.qvw and 1st load Binary.qvw(Now containing the data up to the previous day) and then run the incremental load as usual to capture the new files and save it as itself. Basically a loop but I would need to run both qvw scripts...
Is there a better way to go about this? Is my logic flawed?
Or could this actually be the most resource friendly way to handle this massive amount of data?
Any advice or insights you can share would be greatly appreciated!
Please can anyone advise? I could not find similar posts and would really like to know if it could possibly save our servers a lot of reload time...
I'm not if it's possible and practically to use a combination of binary-loads to get a stable incremental load-process. I assume rather not and I haven't seen or heard yet any examples of them which indicates it would be quite uncommon.
The common way to create an incremenatl load-process is to use qvd's which if they are loaded optimized nearly so fast than a binary-load (especially if the binary is only used for one table) because it's almost the same - the data will be transferred directly into the RAM and the small overhead of to be processed header respectively meta-data will be even with a 15 GB file quite small - the important point is to load the qvd's OPTIMIZED (at least the bigger one).
Here you will find various examples of incremental loadings and the exists-function which will be often needed to ensure the optimized load: Advanced topics for creating a qlik datamodel
Thanks Marcus for your reply.
I have actually gone and set up what I described and it's been running stable and correctly so far... My hopes were that someone may have also tried something similar and could comment on their own experiences... I am a bit swamped with other projects at the moment but when I'm able I will pit the qvd incremental load and this one against one another and see how they fair, time and resource wise... So i can confirm that it works, just not sure how well in comparison to the standard incremental load.
My guess though would be that even though the reloading time for the binaries will be a bit faster, it would still have to open, reload and save 2 qvw's. Where with the standard qvd incremental load the table can be dropped after the qvd has been stored so opening and saving the qvw used for the incremental load will be significantly faster. The scrips finishes running only after the qvd was successfully stored correct?
I didn't believe that binary-loading will have an advantage against qvd-loading from an incremental point of view by single tables. To refresh an whole application it might be useful to load binary the historical data and concatenate the current data to them maybe in a scenario where an application will be refreshed several times a day but you will need further steps to store and append these data to the historical data maybe within a nightly update-window.
I would recommend the QVD route as Marcus has said, but have you tried binary loading itself. Might not be so friendly on a large file though.
You could also have a changing binary load command such as this if you have varying loads to make, just need to store/manipulate the filename in file.txt ...
I would indeed need to append the data, that's why I thought of this "binary loop" method as a way to save the data without utilizing qvd's. I hope in the future that there will be a script command to store the qvw to overwrite the historical data. It would then work the same as a qvd. But for now the standard qvd method seems the only logical option. Tnx again Marcus for your input.
Interesting thank you sir for the info! I am certain it will come in handy for some of the models I am working with...
The problem with binary loading is that the new days data is not appended... Or do you mean that I append the data using qvd's and then load the qvw used as a binary?
I've played about with using a self binary load in the past as a way to store & distribute data in a secured manner, which QVDs don't. I was only using small datasets, though.
Maybe the best explanation is to make a demo yourself ...
1) Create a new qlikview document with this script ...
Load rowno() as RowID autogenerate 1;
... and save it as Source.qvw.
2) Reload it and save. Every time this is done it reads the existing data & adds a row to the Data table.
So you are binary loading the historical data, then the script adds the incremental data and when it is finished you then have a new file which is now all historical data.
Whether this is faster than a QVD routine, I don't know, and you would still need a routine to back up your data in case it gets corrupted or you need to roll back to a point in time. It's also a bit messy if the data structure changes.
Thank you flipside
This looks like a promising idea to test! However I must admit I fail to understand the Autogenerate function... I have tried reading all I can about it on the community etc. If I understand you correctly the Autogenerate 1 adds 1 row each time it is executed. What I don't understand is why?
So say my script looks like this currently
My files are named as such that the last 6 digits are the date the file was created.
For Each vFile in FileList('Source\*.CSV') //Test to check for new files and load if appropriate
If Right(SubField(vFile, '.', 1), 6) > vLastReload Then
Concatenate ("Binary Table")
(fix, codepage is 1252);
Let vLastReload = Right(SubField(vFile, '.', 1), 6);//Sets vLastReload to latest last 6 digits of last loaded file
Where now would I insert your Load rowno() as RowID autogenerate 1;?
Also is there a script command to save it as Source.qvw?
Looking forward to your reply