7 Replies Latest reply: Sep 11, 2017 8:39 AM by Andrei-Dragos Delta RSS

    Cut reload time

    Martin Bacul�k

      Hi guys,

       

      I just want to collect some of your ideas how to solve one simple transformation.

       

      Source Data:

      IDDateAttribute1Attribute2Attribute3

      1

      01-01-2017activeNULLNULL
      101-01-2017NULL1NULL
      101-01-2017NULLNULL8
      102-01-2017activeNULLNULL
      102-01-2017NULL7NULL
      102-01-2017NULLNULL100
      201-01-2017activeNULLNULL
      201-01-2017NULL2NULL
      201-01-2017NULLNULL9
      202-01-2017inactiveNULLNULL
      202-01-2017NULL20NULL
      202-01-2017NULLNULL50

       

      Output should be:

      IDDateAttribute1Attribute2Attribute3

      1

      01-01-2017active18
      102-01-2017active7100
      201-01-2017active29
      202-01-2017inactive2050

       

      Transformation is done for millions of rows, so actually I am trying to find the option for shortest reload time.

       

      Thank you very much.

       

      BR

      Martin

        • Re: Cut reload time
          Bill Markham

          Hard to say without knowing what is happening - Could you share the load script and the reload log.

            • Re: Cut reload time
              Martin Bacul�k

              Hi Bill,

               

              doing just mapping load for each attribute where not isnull(Attribute)and then applying it into next table on distinct combination of ID and Date.

               

              MAP_attribute1:

              mapping load

              Autonumber(ID&'│'&Date),

              attribute1

              Resident XXX

              where not isnull(Attribute1);

               

              Load DISTINCT

              Autonumber(ID&'│'&Date),

              ApplyMap(...) as AttrbuteValue

              Resident XXX;

               

              ...

               

              Thanks.

               

              Martin

            • Re: Cut reload time
              Varun Prakash Paulraj

              Load Null as 0 and Active as 1 and Inactive as 2. The search will become faster

              • Re: Cut reload time
                Sam Grounds

                Hi Martin,

                 

                If the data doesn't change after the fact, I would suggest an 'incremental' reload. This will speed it up massively.

                 

                An example for your case, load the data where the date is greater than a certain date and concatenate old data already loaded into a QVD.

                 

                Eg.

                LET vDate = Date(Today()-1,'DD-MM-YYYY');

                 

                Data:

                LOAD * FROM dataSource

                WHERE Date >= '$(vDate)'

                 

                CONCATENATE (Data) LOAD * FROM C:\DataLocation\Data.qvd (qvd)

                WHERE Date < '$(vDate)'; //may or may not need this WHERE statement depending on how your load works, if there is no date crossover from the stored data and the new data then you won't.

                 

                STORE Data Into C:\DataLocation\Data.qvd (qvd);


                ...This way you won't be transforming old data each time you reload, it will simply add to the old set.


                Hope this helps.


                Sam

                • Re: Cut reload time
                  Andrei-Dragos Delta

                  Hello,

                   

                  Don;t use string for values, it's bad for the memory...

                   

                  I made it like this:

                   

                  LOAD ID, Date,

                       only(if(Attribute1='active',-1,if( Attribute1='inactive',0))) as Attr1,

                       only(Attribute2) as Attribute2,

                       only(Attribute3) as Attribute3

                  FROM

                  [.\Book1.xlsx]

                  (ooxml, embedded labels, table is Sheet1)

                  Group by ID, Date;

                   

                  Group by also consumes resources but it depends on how much data you have.

                   

                  Hope it helps,

                  • Re: Cut reload time
                    Martin Bacul�k

                    Hi Andrei,

                     

                    thanks for advice. Sure, I am using INT instead of CHAR, that was just an example of data set. Group by looks like the best option here.

                     

                    Thank you.

                    Martin