6 Replies Latest reply: Jan 6, 2014 3:31 AM by Mike Swinn RSS

    Add Calculated Column to Existing Table During Load

    Mike Swinn

      I am wondering if there's a quick way to add a column to a table which is based on two (or more) tables, after the two tables are loaded. I have tried various methods but they involve creating temporary tables, renaming, dropping etc.

       

      For example if I have loaded two tables -

       

      T1:

       

      Name  |    Value_New

      A               5

      B               6

      C               7

       

       

      T2:

       

      Name  |    Value_Old

      A               8

      B               2

      C               1

       

      I then want to add a column to T1 defined as Difference = T1.Value_New - T2.Value_Old,

       

      Simple in theory but I can't find a neat way to do it. A join statement doesn't seem to work since if I load from T2 as resident, it doesn't recognise the T1 field.

       

      Nor do I want to simply load T2 first and then do it as part of the T1 load statement, since I need a general solution for more complex cases.

        • Re: Add Calculated Column to Existing Table During Load
          Bill Markham

          Mike

           

          How about doing a Mapping Load from T2 [make sure it is optimised] and then using ApplyMap within calculating your new derived filed when loading from T1.

           

          You would still have to a load from T2, but I cannot see any of doing it without some kind of load from T2.

           

           

          Best Regards,     Bill

          • Re: Add Calculated Column to Existing Table During Load
            Michael Solomovich

            Mike,

             

            It would be natural to join T1 and T2 into one table.  I don't see why it is a problem:


            T1:
            ...;

             

            JOIN (T1) LOAD
            Name,                            // assuming there is field Name in T1
            Value_Old
            RESIDENT T2;
            DROP TABLE T2;

             

            Regards,
            Michael

            • Re: Add Calculated Column to Existing Table During Load
              Mike Swinn

              Thanks for the replies. Poor example on my part I think as both could be suitable for the case I gave, but are not most suitable for some other cases.

               

              To be more specific (loosely based on a recent example):

               

              I have a fact table for sales, and two dimension tables for customer and merchant respectively. A merchant can transition from state A to state B (only once), and I want to run some analyses based on what state the merchant was in when the customer registered.

               

              So although the sales date doesn't come into this at all, the resulting calculation is best placed in the fact table as I'd otherwise have to have one very long dimension table for every combination of customer and merchant, which could end up being larger than even the fact table. I also don't want to use a calculated dimension either as it limits what I can do.

               

              fact_sales:

              Customer_ID     |     Merchant_ID     |     Sales_Value

              001                    |       01               |     $123

              001                    |       02                 |     $456

              002                    |       02               |     $789

              ...             

               

              dim_customer:

              Customer_ID       |     Registration_Date

              001                     |     01/04/2013

              002                    |      01/08/2013

              ...

               

              dim_merchant:

              Merchant_ID     |      Transition_Date

              01                    |     01/02/2013

              02                    |    01/05/2013

              ...

               

              I then want to use the logic below to produce a field in the fact_sales table called Merchant_Status_At_Customer_Reg

               

                   if(dim_customer.Registration_Date < dim_merchant.Transition_Date,

                        'State A',

                        'State B'

                   )

               

              So the fact_sales table would become:

               

              fact_sales:

              Customer_ID     |     Merchant_ID     |     Sales_Value     |    Merchant_Status_At_Customer_Reg

              001                    |       01               |     $123                |      State B (had transitioned)

              001                    |       02                 |     $456               |     State A (had not transitioned)

              002                    |       02               |     $789               |       State B (had transitioned)

              ...

                • Re: Re: Add Calculated Column to Existing Table During Load
                  Nicole Smith

                  Load script like the following should do the trick (see the attached for it in action):

                   

                  fact_sales:

                  LOAD * INLINE [

                  Customer_ID,Merchant_ID,Sales_Value

                  001,01,$123

                  001,02,$456

                  002,02,$789

                  ];

                   

                  dim_customer:

                  LOAD * INLINE [

                  Customer_ID,Registration_Date

                  001,01/04/2013

                  002,01/08/2013

                  ];

                   

                  dim_merchant:

                  LOAD * INLINE [

                  Merchant_ID,Transition_Date

                  01,01/02/2013

                  02,01/05/2013

                  ];

                   

                  LEFT JOIN (fact_sales)

                  LOAD Merchant_ID,

                      Customer_ID,

                      if(Lookup('Registration_Date','Customer_ID',Customer_ID,'dim_customer') <

                      Lookup('Transition_Date','Merchant_ID',Merchant_ID,'dim_merchant'), 'State A', 'State B') as Merchant_Status_At_Customer_Reg

                  RESIDENT fact_sales;

                  • Re: Add Calculated Column to Existing Table During Load
                    Michael Solomovich

                    So, you want to bring both dates into the fact table, calculate the new field, and drop the dates from the fact?  Looks rather reasonable.  You can mark your own answer as "Correct" 

                    If you need a little help with syntax:

                     

                    LEFT JOIN (fact_sales) LOAD DISTINCT

                    Customer_ID,

                    Registration_Date

                    RESIDENT dim_customer;

                     

                    LEFT JOIN (fact_sales) LOAD DISTINCT

                    Mercant_ID,

                    Transition_Date

                    RESIDENT dim_merchant;

                     

                    LEFT JOIN (fact_sales) LOAD DISTINCT

                    Customer_ID,

                    Mercant_ID,

                    if(Registration_Date < Transition_Date, 'State A', 'State B') as State

                    RESIDENT fact_sales;

                    DROP FIELDS Registration_Date, Transition_Date FROM fact_sales;

                     

                    Regards,

                    Michael

                  • Re: Add Calculated Column to Existing Table During Load
                    Mike Swinn

                    Yes was after the best syntax, those should both do it thanks.

                     

                    It's a bit more cumbersome than SQL which would basically add one column, then update the new column with a join - without the need to bring in multiple columns and then drop. It's not much more though.