14 Replies Latest reply: Mar 21, 2011 1:49 PM by Rob Wunderlich RSS

    Creating our QVD's

    Chris Hopkins

      Hi,

      We are having problems getting our QVD's created as our data has got too big. I have been asked to try and find the best way to load in our data and save to QVD's.

      We load in our data everyday at around 1am which previously was taking arond 5 hrs to complete. The problem we are getting now is that it fails everynight as the server runs out of memory.

      The process of creating our QVD's is to first load in the days worth of data then save each table as a TMPQVD.

      then we combine the TMPQVD with the FULL QVD

      I have been thinking of maybe trying the QUALIFY statement to stop the relations between the fields, which i think would speed up the process - Is this right??

      But how then would i save my FULL QVD's unqualified?

      Can anyone help?

        • Creating our QVD's
          s j

          Can You post your increametal load script ?.

            • Creating our QVD's
              Chris Hopkins

              Hi Sunil, thanks for your reply. Attached is our load script...

                • Creating our QVD's
                  Chris Hopkins

                  .

                    • Creating our QVD's
                      Chris Hopkins

                      any ideas ?

                        • Creating our QVD's
                          Chris Hopkins

                          anyone have any ideas?

                            • Creating our QVD's
                              John Witherspoon

                              The qualify wouldn't help even if you could unqualify later. So far as I know, QlikView only works out the relationships between tables at the very end of the script. This is where you see synthetic keys being created, for instance.

                              Creating temp QVDs and combining them into a full QVD using only QVD loads is also, so far as I know, the most memory-efficient way to do what you're doing.

                              If there's any way to reduce the size of your fields and how many fields are being loaded, that might help.

                              If there are text values that could be reduced to codes with the text in a separate QVD, that would help, though it would also be annoying to deal with when loading in the QVD data.

                              You could perhaps just never build a full QVD. Perhaps the user applications could load from MyData*.qvd (QVD) or something along those lines.

                              You could split it into a couple QVDs, like with half the fields in one and half in the other, or half the rows in one and half in the other. Either way, you wouldn't have to have the full data set in memory at once.

                              But I honestly wouldn't do any of that. The BEST answer, I think, is to throw RAM at the problem. Don't waste expensive man hours making a more complicated and error-prone process when cheap hardware can solve the problem faster and better.

                    • Creating our QVD's
                      Rob Wunderlich

                      I'm not sure about your memory problem, but I'll bet that

                      LOAD DISTINCT *

                      against your full QVD is a performance killer. I understand you are trying to avoid creating duplicates if a tmp file comes in more than once. It would be better to build a Primary Key field for each file which would be a combination of fields, the set of fields may vary for each file. For example, it looks like

                      StoreId & '|' & TransId as PK

                      may work for severalof the files. Then you can use a WHERE NOT EXISTS(PK) in the full qvd load insead of distinct.

                        • Creating our QVD's
                          Chris Hopkins

                          Thanks both for your replies...

                          John, unfortunately more RAM isn't an option at the moment as we are already on the maximum capacity for our server. A new server has been passed for next years budget, but obviously that doesn't solve our problem in the meantime.

                          Rob, The reason we used LOAD DISTINCT * was because that would allow the QVD's to be optimised. Would it still be optimised if we used WHERE NOT EXISTS (PK)?

                          Thanks

                          • Creating our QVD's
                            Chris Hopkins

                            Hi Rob, I have tried your suggestion but the script keeps failing when it reaches the WHERE statement.

                            Can you tell me where the WHERE statement should go in the script and what it should replace please?

                            Should i replace both LOAD DISTINCT * and CONCATENATE LOAD DISTINCT * statements?

                            Here is my script..

                            SET cFileName =$(vcRetailDailyAPIQVD)TMPstoreproduct.QVD;
                            IF QvdCreateTime('$(cFileName)') >= 0 THEN
                            StoreProduct:
                            LOAD DISTINCT *
                            FROM $(cFileName) (qvd);


                            SET cFileName =$(vcRetailQVD)StoreProduct.QVD;
                            IF QvdCreateTime('$(cFileName)') >= 0 THEN
                            StoreProduct:
                            CONCATENATE LOAD DISTINCT *
                            FROM $(cFileName) (qvd);
                            ENDIF

                            IF NoOfRows('StoreProduct') > 0 THEN
                            STORE * FROM StoreProduct INTO $(vcRetailQVD)StoreProduct.QVD;
                            DROP TABLE StoreProduct;
                            ENDIF
                            ENDIF

                             

                            Thanks very much.

                              • Creating our QVD's
                                Rob Wunderlich

                                 


                                hopkinsc wrote:Should i replace both LOAD DISTINCT * and CONCATENATE LOAD DISTINCT * statements?


                                If I understand your script correctly, the first LOAD is from the tmp qvd that was created from the csv file. If you can only load the csv once, then you shouldn't need the DISTINCT. However, I'm guessing the tmp qvd is a small volume so the DISTINCT shouldn't make much difference one way or the other.

                                Now as to updating the master qvd in the second load. Assume a primary key field named "PK" exists or you created one as a composite of several fields. Then your second load should look like this:

                                SET cFileName =$(vcRetailQVD)StoreProduct.QVD;
                                IF QvdCreateTime('$(cFileName)') >= 0 THEN
                                StoreProduct:
                                CONCATENATE LOAD *
                                FROM $(cFileName) (qvd)
                                WHERE NOT EXISTS(PK)
                                ;
                                ENDIF

                                The CONCATENATE is only required if you can have differenct field names between the tmp qvd and the master qvd, such as when adding new fields. If the field names will always be the same, leave off the CONCTENATE.