3 Replies Latest reply: Feb 19, 2016 7:14 PM by Stefan Wühl RSS

    Which is faster to drop records, inner join or non-concatenate load and drop?

    Andy Dietler

      I have a script that creates record for each date between the start and end of a contract and links back to the contract fact table with %ContractKey. Which of these ways is going to load faster:

       

      Use an Inner Join to drop the invalid rows:

       

      //Get Min/Max Dates

      MinMaxDate:

      LOAD

        Min([Contract Date Start]) AS MinDate,

          Max([Contract Date End]) AS MaxDate

      Resident Contracts;

       

       

      Let vMinDate = Peek('MinDate');

      Let vMaxDate = Peek('MaxDate');

       

       

      Drop Table MinMaxDate;

       

       

      //Create Table of All Dates in Range

      ContractDates:

      LOAD 

        Date($(vMinDate) + IterNo() - 1) as [Active Date] 

      AutoGenerate 1 While $(vMinDate) + IterNo() -1 <= $(vMaxDate); 

       

       

      //Cartesian All Contracts to Dates

      Join (ContractDates)

      LOAD Distinct

        %ContractID,

          [Contract Date Start] AS Start_temp,

          [Contract Date End] AS End_temp

      Resident Contracts;

         

      //Remove Dates Outside of Contract Start and End

      Inner Join (ContractDates)

      LOAD

        %ContractID,

          [Active Date]

      Resident ContractDates

      WHERE Start_temp <= [Active Date]

      AND End_temp >= [Active Date]

      ;

       

      Drop Fields Start_temp, End_temp;

       

      Or a similar method but with a temp table and then load a 2nd table and drop the temp one:

       

      //Get Min/Max Dates

      MinMaxDate:

      LOAD

        Min([Contract Date Start]) AS MinDate,

          Max([Contract Date End]) AS MaxDate

      Resident Contracts;

       

       

      Let vMinDate = Peek('MinDate');

      Let vMaxDate = Peek('MaxDate');

       

       

      Drop Table MinMaxDate;

       

       

      //Create Table of All Dates in Range

      ContractDates:

      LOAD 

        Date($(vMinDate) + IterNo() - 1) as [Active Date] 

      AutoGenerate 1 While $(vMinDate) + IterNo() -1 <= $(vMaxDate); 

       

       

      //Cartesian All Contracts to Dates

      Join (ContractDates)

      LOAD Distinct

        %ContractID,

          [Contract Date Start] AS Start_temp,

          [Contract Date End] AS End_temp

      Resident Contracts;

         

      //Remove Dates Outside of Contract Start and End

      Rename Table ContractDates to ContractDates_temp;

       

      NoConcatenate

      ContactDates:

      LOAD

        %ContractID,

          [Active Date]

      Resident ContractDates_temp

      WHERE Start_temp <= [Active Date]

      AND End_temp >= [Active Date]

      ;

       

      Drop Table ContractDates_temp;

       

      Drop Fields Start_temp, End_temp;

       

      Or is there a better way to accomplish this?