Tips for a better way to reload lots of QVD data - Qlik Community

peter_turner · ‎2009-09-01

Hello Everyone,

I am loading 2 million+ records from 100+ QVD files (300mb), but this takes awhile.

At present, each time new data is available, I grab it and save it to a QVD (which is fine), then load * QVD's.(which is where the problem is)

My question is, as the main 'data' table is already in ram, what's the best way to incrementally load/add/partial reload/join just the new qvd file?

At present I'm using something like...
NewData:
load new data from DB where timestamps etc etc.
save NewData to qvd123456.qvd
drop NewData.

AllData:
load * .qvd

My project is reloaded by the QV9 server, 32bit.
Would my main 'Data' table be stored in the QVW (which is 120mb), and what's the best way to add the new data to the start of the 'Data' table without reloading the whole lot every time.

Peter.

johnw · ‎2009-09-01

So your 100+ QVD files are not different tables, but merely new records to add to the same table? Each time you create a new QVD, you just want to append the data without having to reload all the old data?

I'm not aware of anything in the script that would allow you to keep tables already loaded. I believe the first thing a reload does is drop everything. So I suspect your only way to avoid loading the entire table is to push the data into RAM using the new real time loading APIs. Unfortunately, I haven't played with those, so I can't help you there other than saying that it should be possible.

That said, I'm not sure why there would be a problem with load speed. An optimized QVD load of two million rows of data might take ten or twenty seconds on our system, which isn't exactly high end. Is that too high, or are you seeing much longer load times in your system? Maybe the problem is that you have 100+ QVD files to load from rather than just appending to the end of the existing QVD file each time. Have you looked into incremental loading? Or maybe you aren't getting an optimized QVD load, so it's taking much longer than it should for that reason? Plenty of possibilities for why a script might run slowly, and plenty of ways to make it run faster.

peter_turner · ‎2009-09-01

Your correct the QVD files are the for the same 'Data' table, each file containing a new time period of data.
The reason i went for lots of smaller qvd's was because the data is re-loaded every 10mins (hopefully), and i didn't want the same qvd file on the disk being read and re-written over and over.
Its also useful as i can accumulate data into 'day' qvd files.

The existing load takes around 2 mins to load all files, i think its doing an optimized QVD load as it the debug log says Data_Table < qvd1234.qvd (qvd optimised), and the script isn't doing any calculations of the file.

I'm very curious as to what is being stored in my qvw file, it grows from 200kb to 120mb, and if that's some data i can re-use then great.

Peter.

peter_turner · ‎2009-09-01

I'm just having a think about this, what does everyone think....

What if i have a 'Fresh data load' project that's only job is to get the latest data from my DB and save as a QVD file, together with a little list of what files its just made.
Should be a quick job to perform (data preparation)

On the main project QVW, i set the server to keep this pre loaded in ram (as the QVW does contain all data i have), i then trigger a partial reload to only add new QVD's to my main 'Data' table, then save the QVW again (containing all data).
This way i don't have to dump and reload my entire dataset every time i do a refresh.

I still have calculations i need todo on the entire dataset at every re-fresh, so i'm not sure the partial reload feature will also re-do my calculations?, I've never used that feature before.

Any thoughts?

Peter.

disqr_rm · ‎2009-09-01

2 -3 million records shouldn't be more than 20 seconds from QVD. Are you sure all the field names are same in all QVDs and no circular key problems? You are not doing any conversion (functions) while loading, as you are doing load *, so I do not see any reason why it would take long time to load.

Just to try: see if you can make use of partial reloads using ADD or REPLACE in the LOAD statements, and just load the newdata*.qvd files when reload is done - may be controlled by a variable, so you can do full reload when needed.

Report Inappropriate Content · ‎2009-09-16

I have a similar but not identical question to this:

I have a report that drops into a location every day. I want to implement a process that appends the new data to an existing QVD file that is accumulating daily charge information.

From what I read here, there isn't a LOAD method that will append in some way?

I'm imagining a two- or three-step process that might accomplish this, but if there's a simple way I'd love to hear it. Thanks!

Report Inappropriate Content · ‎2009-09-16

Just giving a thought:

Agreed that you don't want to read and rewrite 300mb data every 10 minutes.

But you can opt for archiving everything into a single qvd every 1 day.

In that way you have to load only the fully archived QVD+other QVDs of the day.

Thus QlikView will be relieved of associating data from 100s of qvds everytime i hope.

rbecher · ‎2009-09-17

Hi Peter,

for my opinion storing data in qvd files is only useful if you want to use them in multiple different layouts (qvw files). If this is not your approach it is very easy to do an initial load into a qvw file and then do daily partial reloads into the qvw file with an ADD LOAD statement to add all new qvd files per day. And the benefit is all old data are still loaded into the qvw file and (after open it) into the RAM. You can also use this loaded data (stored in the qvw file) in other layouts by BINARY LOAD.

- Ralf

Astrato.io Head of R&D

Oleg_Troyansky · ‎2009-09-17

In my opinion, any solution that could help combining hunderds of QVDs into one, would help speeding up the load. One of the suggestions was to performa a daily consolidation, it sounds like a great compromise to me.

Regarding the possibility of running a partial reload - it's possible, only if no additional calculations are required. Since you mentioned that you do need to perform calculations at the end, you'll have to structure your reload process in 3 steps:

1. Read fresh data from the database and store into a QVD.

2. Open the "intermediate" QVW with "raw" data and run a partial reload to append the new data from QVD into the QVW.

3. BINARY load the "intermediate" QVW into the final QVW and perform all the final calculations there.

Separately from the load process above, create a nightly process that could read all the existing QVDs and combine them into a single QVD.

This sounds like the most optimal process for your frequent reloads...

cheers,

Oleg Troyansky
Call me for expert support, performance tuning, advanced training and more!

peter_turner · ‎2009-09-17

Thanks to everyone for the suggestions.

My requirement changed a little, but I'm going to try a few suggestions from here.
New Spec:
20 tables
20x365=7300 QVD's over a year
~6 Mill rows per year (system total)
64bit server with tons of ram
Required to retain each QVD per table per day

The approach I was thinking of using turns out to be the same as Oleg's suggestion...
1, Make a 'New Data' project than gets the new SQL data and stores to QVD
Also any mapping and processing i can do on the new data.
(no gui just pure script)

2, A 'Combine' project that does a partial reload to add new data to the QVW.
(no gui just pure script)
= one nice QVW with the data in QV format

3, Presentation project which does a binary load of the 'Combine' project
This is then web served in Ajax to the 20 users.

Last question on this idea...
Is it possible to chain these events together in QV9 server?
So that each step would then start the next step.
I could do this as a batch file, but want to keep things in QV if possible.

Thanks,
Peter.