5 Replies Latest reply: Jun 1, 2015 6:49 AM by Massimo Grossi RSS

    how to 'batch' load for 192mio lines table / with group by

      Hello,

       

      I am looking to make a group by (4 dimensions) on a relatively large table of 192mio lines. Computer says no.

       

      So I thought to break up in chuncks of 10mio lines each, using RowNo (). However it takes more than 10 hours to complet that command.

       

      Is there a simple way, to 'batch' every 10mio lines, repeat the process after finishing the batch until all 192mio lines are processed in the group by statement?

       

      Experience by trying learns that 10mio lines can be handled. Script looks like:

       

      Left join MainTable:

      Load

      ID,

      KeyOne,

      KeyTwo,

      KeyThree,

      sum(A*B) as NewVar

      Resident MainTable

      Group by ID, KeyOne, KeyTwo, KeyThree;

       

      Another solution may be to create a MainTableNew and 'batch' with a concatenate. The 'group by' adds within each line a NewVar, so in total 192mio lines will be delivered.

       

      Looking forward to your suggestions.

       

      Kind regards, Ed.

        • Re: how to 'batch' load for 192mio lines table / with group by
          Bill Markham

          if you split the incoming data into discrete chunks, Group by the individual chunks and concatenate them together you will get a different result as opposed to doing a single Group by over all the data.  Would this cause an issue ?

           

          When "Computer says no" does it give an error message ?

            • Re: how to 'batch' load for 192mio lines table / with group by

              Hi Bill,

               

              Thanks for responding.

               

              a. Don’t think (but not sure) that it makes a difference, since the group by is also on the smallest variable, meaning it will calculate the sum for each individual line, with values of all four dimensions in it.

               

              b. Laptop just stops. 8Gig Mem is not enough and disk also is too small. Might be something to consider pc, but need to be certain that it will function / perform to my needs.. Looking into it.

               

              Thanks again!

               

              Kind regards,

               

               

              Ed Borsboom

               

               

               

               

              Brinklaan 120 |  Bussum | The Netherlands

              www.brandbooming.com <http://www.boomtoo.com/> | info@brandbooming.com <mailto:info@boomtoo.com> | 0031 6 34560448

               

              IBAN NL38ABNA0556315731 | VAT (BTW) 821513941B01 | KVK 32162131

              BrandBooming is a registered trade name of Silver Bullet BV

                • Re: how to 'batch' load for 192mio lines table / with group by
                  Stefan Wühl

                  If you calculate the sum(A*B) on each line, why do you need the GROUP BY and the JOIN, then?

                  Both will consume a lot of memory.

                   

                  You should be able to calculate the product A*B in your original MainTable already.

                  • Re: how to 'batch' load for 192mio lines table / with group by
                    Marcus Sommer

                    I think bill.markham is right and it will be make bigger differences how do you handle such a group by load. I would probably take this way:

                     

                    - load and store as qvd 10M chunks from your source without transformations and then delete the table

                    - load and store as qvd each chunk with your group by and then delete the table

                    - load all group-by qvd's and concatenate them

                     

                    This are good prerequisite for incremental loads: Incremental Load

                     

                    Maybe there are further possibilities, for example to flag your "A" and "B" field if they are NULL or 0 or there are other criterias which you could filter out by a where-clause. Further do you need each key / each field in this table - maybe you could switch your fields/keys to other tables or could use other (numeric) keys or ...

                     

                    By the way: a laptop with 8GB is not really a well suited tool for these amount of data

                     

                    - Marcus

                • Re: how to 'batch' load for 192mio lines table / with group by
                  Massimo Grossi

                  can you split the group by using KeyOne values (one KyeOne value or a group of KeyOne values)? this shouldn't change the result of the group by

                   

                  I think something like

                   

                  loop

                       create a table with some KeyOne values

                       read from qvd where exists KeyOne values and group by

                       store in qvd

                  end loop

                   

                  at the end you have some qvd with the group by result