Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Is there any way to tell QV to parse results from database continiously instead of loading all into RAM and then process them?
Problem:
I load lots of data from a big table and save it into a QVD file. I do quite a lot of generated fields, but no groupings or anything. So each line should be able to be processed individually.
(Loading approx 50 million lines with 60 fields.)
But when looking at the processes, I first see that slowly the process qvconnect64.exe grows until it uses over 50 GB of RAM, then qv.exe starts to increase in memory up to 8 GB.
So my assumption is that QV first loads all data from the data base and stores it uncompressed in memory. And first after all 50 million lines are fetched, it starts to crate generated fields, and compress into its own format. This seems like a big waste and a need for a lot of ram that just is used for a very short time.
Is there any way to instruct QlikView Load script to process the data from the database line by line, and thereby use much less memory during the load phase?
My script for loading and saving as qvd is in this format:
events:
LOAD "field1",
"field2",
"field3",
Date(DayStart(timestamp1),'YYYY-MM-DD') as at_date, //some generated fields ...
Date(MonthStart(timestamp1), 'YYYY-MMM') as at_year_month,
…
…
…
"field60";
SQL SELECT "field1",
"field2",
"field3",
"timestamp1",
…
…
…
"field60"
FROM some_table"
;
store events into events.qvd;
Best Regards
Andreas
Andreas, There is no setting in QlikView to read the data in line. However, my recommendation is to try various incremental load logic available in QlikView that enable to append the current data to history qvd. You can retrieve the data from the history qvd for your Development. Does it sounds like a solution aligned with your requirements?
Any method in particular you can recommend?
Are we talking about generating a sequence of smaller qvd:s based on ranges of a timestamp?
Is there any way of doing that in one script without having to duplicate the load expression many times, thus creating a maintanance nightmare?
Hi Andreas
As suggested by Manoj, I also recommed you to use Incremental Load to reduce the usage of RAM. He is not talking anything about generating sequence of smaller qvd's. He is talking about loading changed data from the database and unchanged data from the QVD. Please look into "Using QVD files for incremental load" topic in QlikView Reference Manual. It will help you to start the incremental load.
If you have anymore questions please feel free to post the same.
Thanks,
Attitude
Hi Attitude, thanks for your reply.
I am familiar with Incremental Load and I'm using it.
But sometimes you need to do a full reload of all data, since you have changed or added something in the database. And it is in this case when I must fetch all 50 million lines I get problem with memory usage.
Andreas, I can understand the dynamic data changes resulting to the 'Purge and Reload' in your environment. Incremental Reload is not the right solution considering the time for incremental logic to validate the records for data retrieval. Also, there is no straight solution from my end without knowing the environment. However it is good to work on the hardware benchmarking the data volume changes by considering the 50m records on daily / weekly basis. Initial data extraction from the database may continue to utilize the hardware resources considering the data volume. However, application of the QlikView best practices in transforming the data and building an optimized data model will reduce the Application / Server Performance overhead. Please let me know if this helps.
I ended up making a loop and creating smaller qvd files, one per year. And dropping the table after each save to disk.
By doing so I never used more than a third of previous amout of RAM (since i split it on three years).
It works, but feels like a hack.
Hi Andreas!
Is HIGHLY recomended to not make any calculations in read proccess. In your example, you could create the calculated fields (like at_date and at_year_month) in the Stage proccess.
But, if you want to do it while loading, the way of make the source "works" is to put these functions in the SQL SELECT part of the instruction, and don't put the preceed LOAD.
Try removing the LOAD part, measure time and memory, and then put the calculations in SQL side, and do it again.
Please, tell us the results!!