3 Replies Latest reply: Oct 1, 2015 10:15 AM by Daniel Rozental RSS

    Turning Unoptimized Loads into Optimized Loads

    Daniel Rozental

      I wanted to share a couple of test I did on trying to get loads to be as fast as possible and hopefully get some feedback or some other tips.

       

      Sorry for the long post I believe it's worth reading through.

       

      Test 1)

       

      When concatenating two tables that don't have the same number of fields, if the second table has the same fields than the first one and then some extra fields the load will still be optimized, if done the other way around it will not be optimized.

       

      e.g. All Loads are optimized

       

      Table:
      LOAD               // Optimized
           A,
           B,
           C
      FROM Table1.qvd (qvd);
      
      CONCATENATE(Table) // Optimized
      LOAD
           A,
           B,
           C,
           D,
           E
      FROM Table2.qvd (qvd);
      
      
      

       

      e.g. 2nd Load isn't optimized

       

      Table:
      LOAD               // Optimized
           A,
           B,
           C,
           D,
           E
      FROM Table2.qvd (qvd);
      
      CONCATENATE(Table) // Not optimized
      LOAD
           A,
           B,
           C
      FROM Table1.qvd (qvd);
      
      

       

      Test 2)

       

       

      Second table has some fields in common with the first but is missing some, each table has 50 Million rows, 2nd load will not be optimized but loading the table in optimized mode, adding the missing fields, storing it and loading it again optimized will be faster than just concatenating the tables straight up.

       

      e.g. 2nd load isnt optimized (in this example it took about 1 min to load. PC is Core i5 x64 4 GB running Windows 7).

       

      R00:
      LOAD ShipperID,       // Optimized
           OrderDate, 
           CustomerID, 
           UnitPrice, 
           sales, 
           COS
      FROM
      R00_1.QVD
      (qvd);
      
      Concatenate(R00)      // Not Optimized
      LOAD ShipperID, 
           CustomerID, 
           Discount, 
           ProductID, 
           Quantity, 
           UnitPrice
      FROM
      R00_2.QVD
      (qvd);
      

       

      If you load the 2nd table without concatenating it, add the missing fields store it and load it again to concatenate it while reading it optimized it will be faster (In my example took 50% of the time).

       

       

      R00:
      LOAD ShipperID,                // Optimized
           OrderDate, 
           CustomerID, 
           UnitPrice, 
           sales, 
           COS
      FROM
      R00_1.QVD
      (qvd); 
      
      R000_Aux:
      LOAD ShipperID,                // Optimized
           CustomerID, 
           Discount, 
           ProductID, 
           Quantity, 
           UnitPrice
      FROM
      R00_2.QVD
      (qvd);
      
      concatenate(R000_Aux)          // Not Optimized. 0 records are added
      LOAD null() as ShipperID, 
           null() as OrderDate, 
           null() as CustomerID, 
           null() as UnitPrice, 
           null() as sales, 
           null() as COS
      autogenerate(0);
      
      store R000_Aux into R000_Aux.QVD;
      drop table R000_Aux;
      
      concatenate(R00)                //This load will now be optimized!
      LOAD ShipperID, 
           OrderDate, 
           CustomerID, 
           UnitPrice, 
           sales, 
           Discount, 
           ProductID, 
           Quantity, 
           COS
      FROM
      R000_Aux.QVD
      (qvd);