Turning Unoptimized Loads into Optimized Loads

    I wanted to share a couple of test I did on trying to get loads to be as fast as possible and hopefully get some feedback or some other tips.

     

    Sorry for the long post I believe it's worth reading through.

     

    Test 1)

     

    When concatenating two tables that don't have the same number of fields, if the second table has the same fields than the first one and then some extra fields the load will still be optimized, if done the other way around it will not be optimized.

     

    e.g. All Loads are optimized

     

    Table:
    LOAD               // Optimized
         A,
         B,
         C
    FROM Table1.qvd (qvd);
    
    CONCATENATE(Table) // Optimized
    LOAD
         A,
         B,
         C,
         D,
         E
    FROM Table2.qvd (qvd);
    
    
    

     

    e.g. 2nd Load isn't optimized

     

    Table:
    LOAD               // Optimized
         A,
         B,
         C,
         D,
         E
    FROM Table2.qvd (qvd);
    
    CONCATENATE(Table) // Not optimized
    LOAD
         A,
         B,
         C
    FROM Table1.qvd (qvd);
    
    

     

    Test 2)

     

     

    Second table has some fields in common with the first but is missing some, each table has 50 Million rows, 2nd load will not be optimized but loading the table in optimized mode, adding the missing fields, storing it and loading it again optimized will be faster than just concatenating the tables straight up.

     

    e.g. 2nd load isnt optimized (in this example it took about 1 min to load. PC is Core i5 x64 4 GB running Windows 7).

     

    R00:
    LOAD ShipperID,       // Optimized
         OrderDate,
         CustomerID,
         UnitPrice,
         sales,
         COS
    FROM
    R00_1.QVD
    (qvd);
    
    Concatenate(R00)      // Not Optimized
    LOAD ShipperID,
         CustomerID,
         Discount,
         ProductID,
         Quantity,
         UnitPrice
    FROM
    R00_2.QVD
    (qvd);
    

     

    If you load the 2nd table without concatenating it, add the missing fields store it and load it again to concatenate it while reading it optimized it will be faster (In my example took 50% of the time).

     

     

    R00:
    LOAD ShipperID,                // Optimized
         OrderDate,
         CustomerID,
         UnitPrice,
         sales,
         COS
    FROM
    R00_1.QVD
    (qvd);
    
    R000_Aux:
    LOAD ShipperID,                // Optimized
         CustomerID,
         Discount,
         ProductID,
         Quantity,
         UnitPrice
    FROM
    R00_2.QVD
    (qvd);
    
    concatenate(R000_Aux)          // Not Optimized. 0 records are added
    LOAD null() as ShipperID,
         null() as OrderDate,
         null() as CustomerID,
         null() as UnitPrice,
         null() as sales,
         null() as COS
    autogenerate(0);
    
    store R000_Aux into R000_Aux.QVD;
    drop table R000_Aux;
    
    concatenate(R00)                //This load will now be optimized!
    LOAD ShipperID,
         OrderDate,
         CustomerID,
         UnitPrice,
         sales,
         Discount,
         ProductID,
         Quantity,
         COS
    FROM
    R000_Aux.QVD
    (qvd);
    

     

    This document was generated from the following discussion: Turning Unoptimized Loads into Optimized Loads