Load selection of the data

Report Inappropriate Content · ‎2010-11-08

Hello!

Recently I made a very nice prototype with Qlikview without any experience. We loaded 2 months based on 2 kind of 'fact' tables. 1 table with about 86000 rows with requestid, age, class, customer type, Year request, Month request, etc and another table with 6 mln rows with requestid, type of question and score (from 0 t/m 10). The result is very good for these 2 months.

Now my customer would like to go a bit further by loading 34 months. I am not sure how many rows, but the 2 files together are 6 Gb. I tried to load it, but I did not succeed. Now I want to handle this step by step by loading 1 month first out of these gigant files.

My first load is

Load * From file1 Where Year = 2010 and Month = 3

Result is what i want, only 39000 rows loaded.

My second load was:

Load * From file 2

But I changed it in:

Inner Join (Table1) Load * From file 2

What I expected is that the result would be that about 3 mln records were loaded, but instead I have to stop the process at loading record 20 mln.

I only want to load the data of the second file where Year=2010 and Month =2

Something like

Inner Join (Table1) Load * From file 2 Where Table1.Year=2010 and Table1.Month=2

Is something like this possible?
Or is there another (simple) way?
Do you think it is too much data in total?
Is it better to split the files in year or something?

I think the ETL process can be better but we have no time left for this prototype. The prototype is meant to make a decision about qlikview if it can handle enourmous sizes of data.

Hopefully somebody has some advice! Thank you!

Patricia

Report Inappropriate Content · ‎2010-11-08

Hi Patricia,

a) as you already said, I would load the data(-slices) first into qvd-files. Perhaps monthy or even yearly, accorded to your ressources. But no joins or something else. From then on you can save a lot of time while loading optimized out of your qvd-files. It helps you to shorten test-time.

b) As far as I know, QV reads first all the rows into memory and then starts the "joining". So you need another way, i.e. using "where exists". Or as mentioned before smaller slices.

c) this is not to much data for QV, all you need therefor is enough memory (see also others threads)

Good luck with your prototype (and don't hesitate to ask for more details)

Roland

Report Inappropriate Content · ‎2010-11-08

Hi Roland!

Thank you for your answers. I hoped I did not have to use the qvd files, because I did not understand 😉

But I will take your advice and try to load it into qvd files first. Can you help me understanding and setting up a bit?

What I do not understand, I think I must create a qvd for each year (correct?), but year is not in the 2nd file.

Table1:

Load * From File1 Where Year=2010

Store Table1 Into Table1.qvd

Table2:

Load * From File2 Where ???

And loading the data into qvd files: is that a 'once in a lifetime' action for historical data? Because if I insert this into the script, it will run everytime I start the script.

A lot of questions, but maybe you can help me starting up....

Thanks!

Patricia

Report Inappropriate Content · ‎2010-11-08

OK, OK,

what you can do is: with one qvw-script (let's call it qvd-generator) we create all necessary qvd-files. Because you can store only one table in one qvd-file you will have something like this:


Load * From File1 Where Year=2010;
Store Table1 Into Table1.qvd
Drop Table1; // we want to store it only in a qvd.
Load * from File2; // (may be with an appropriate filter)
Store Table2 into Table2.qvd;
drop Table2;
// next file, next table, next portion of a table, .....

Then you will have some qvd-files, each containing the whole or a part of a table. And a qvd-generator containing nothing (and I mean nothing) but a script like above. Next step is to create your QV-data itself. In a new qvw-script you will then load the data you need only from the qvd-files, perhaps for testing purpose not all qvds at once. There will be (at least) two load-scripts: your qvd-generator and your database-generator in which you treat a qvd-file like a normal input-table. Check out which function you can use without loosing the optimized loading from qvd.

There are some advantages: You can save a lot of load-time, at least during developing and testing. Second, if you can split your data i.e. into years you need to load former years only once from the original datasource (I assume that the old data won't change any more). And if you are in a competition, you can show the fasted load ever seen . Joining and so on is independent of the type of the datasource (database, text, excel, qvd, .....). This depends only on the data structure itself. If you need help ?!

Nevertheless when you bring all your data together, you need the according ressources, especially enough memory.

HTH a bit

Roland

Report Inappropriate Content · ‎2010-11-09

Hi Roland!

Thank you for the detailed explanation. I succeeded in making qvd (but this is of all data). It took about 2 or 3 hours

Loading the qvd is still impossible, but I think I know the problem...memory. Maybe tomorrow I have a solution for that.

Keep you updated in case I need some help

Again, thanx for your help so far.

Patricia

Report Inappropriate Content · ‎2010-11-09

Hi Patricia,

glad to help you. I am looking forward for the next details.

RR

Report Inappropriate Content · ‎2010-11-17

The memory is still a problem, but I succeeded in loading 9 months...with the qvd files

Related Topics