Solved: Creating our QVD's - Qlik Community

hopkinsc · ‎2011-03-16

Hi,

We are having problems getting our QVD's created as our data has got too big. I have been asked to try and find the best way to load in our data and save to QVD's.

We load in our data everyday at around 1am which previously was taking arond 5 hrs to complete. The problem we are getting now is that it fails everynight as the server runs out of memory.

The process of creating our QVD's is to first load in the days worth of data then save each table as a TMPQVD.

then we combine the TMPQVD with the FULL QVD

I have been thinking of maybe trying the QUALIFY statement to stop the relations between the fields, which i think would speed up the process - Is this right??

But how then would i save my FULL QVD's unqualified?

Can anyone help?

rwunderlich · ‎2011-03-20


hopkinsc wrote:Should i replace both LOAD DISTINCT * and CONCATENATE LOAD DISTINCT * statements?

If I understand your script correctly, the first LOAD is from the tmp qvd that was created from the csv file. If you can only load the csv once, then you shouldn't need the DISTINCT. However, I'm guessing the tmp qvd is a small volume so the DISTINCT shouldn't make much difference one way or the other.

Now as to updating the master qvd in the second load. Assume a primary key field named "PK" exists or you created one as a composite of several fields. Then your second load should look like this:

SET cFileName =$(vcRetailQVD)StoreProduct.QVD;
IF QvdCreateTime('$(cFileName)') >= 0 THEN
StoreProduct:
CONCATENATE LOAD *
FROM $(cFileName) (qvd)
WHERE NOT EXISTS(PK)
;
ENDIF

The CONCATENATE is only required if you can have differenct field names between the tmp qvd and the master qvd, such as when adding new fields. If the field names will always be the same, leave off the CONCTENATE.

View solution in original post

suniljain · ‎2011-03-16

Can You post your increametal load script ?.

hopkinsc · ‎2011-03-16

Hi Sunil, thanks for your reply. Attached is our load script...

hopkinsc · ‎2011-03-16

.

hopkinsc · ‎2011-03-16

any ideas ?

hopkinsc · ‎2011-03-16

anyone have any ideas?

johnw · ‎2011-03-16

The qualify wouldn't help even if you could unqualify later. So far as I know, QlikView only works out the relationships between tables at the very end of the script. This is where you see synthetic keys being created, for instance.

Creating temp QVDs and combining them into a full QVD using only QVD loads is also, so far as I know, the most memory-efficient way to do what you're doing.

If there's any way to reduce the size of your fields and how many fields are being loaded, that might help.

If there are text values that could be reduced to codes with the text in a separate QVD, that would help, though it would also be annoying to deal with when loading in the QVD data.

You could perhaps just never build a full QVD. Perhaps the user applications could load from MyData*.qvd (QVD) or something along those lines.

You could split it into a couple QVDs, like with half the fields in one and half in the other, or half the rows in one and half in the other. Either way, you wouldn't have to have the full data set in memory at once.

But I honestly wouldn't do any of that. The BEST answer, I think, is to throw RAM at the problem. Don't waste expensive man hours making a more complicated and error-prone process when cheap hardware can solve the problem faster and better.

rwunderlich · ‎2011-03-17

I'm not sure about your memory problem, but I'll bet that

LOAD DISTINCT *

against your full QVD is a performance killer. I understand you are trying to avoid creating duplicates if a tmp file comes in more than once. It would be better to build a Primary Key field for each file which would be a combination of fields, the set of fields may vary for each file. For example, it looks like

StoreId & '|' & TransId as PK

may work for severalof the files. Then you can use a WHERE NOT EXISTS(PK) in the full qvd load insead of distinct.

hopkinsc · ‎2011-03-17

Thanks both for your replies...

John, unfortunately more RAM isn't an option at the moment as we are already on the maximum capacity for our server. A new server has been passed for next years budget, but obviously that doesn't solve our problem in the meantime.

Rob, The reason we used LOAD DISTINCT * was because that would allow the QVD's to be optimised. Would it still be optimised if we used WHERE NOT EXISTS (PK)?

Thanks

rwunderlich · ‎2011-03-17


hopkinsc wrote:The reason we used LOAD DISTINCT * was because that would allow the QVD's to be optimised. Would it still be optimised if we used WHERE NOT EXISTS (PK)?

Yes, EXISTS(PK) will be optimized.