Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Add Calculated Column to Existing Table During Load

I am wondering if there's a quick way to add a column to a table which is based on two (or more) tables, after the two tables are loaded. I have tried various methods but they involve creating temporary tables, renaming, dropping etc.

For example if I have loaded two tables -

T1:

Name  |    Value_New

A               5

B               6

C               7

T2:

Name  |    Value_Old

A               8

B               2

C               1

I then want to add a column to T1 defined as Difference = T1.Value_New - T2.Value_Old,

Simple in theory but I can't find a neat way to do it. A join statement doesn't seem to work since if I load from T2 as resident, it doesn't recognise the T1 field.

Nor do I want to simply load T2 first and then do it as part of the T1 load statement, since I need a general solution for more complex cases.

6 Replies
Anonymous
Not applicable
Author

Mike

How about doing a Mapping Load from T2 [make sure it is optimised] and then using ApplyMap within calculating your new derived filed when loading from T1.

You would still have to a load from T2, but I cannot see any of doing it without some kind of load from T2.

Best Regards,     Bill

Anonymous
Not applicable
Author

Mike,

It would be natural to join T1 and T2 into one table.  I don't see why it is a problem:


T1:
...;

JOIN (T1) LOAD
Name,                            // assuming there is field Name in T1
Value_Old
RESIDENT T2;
DROP TABLE T2;

Regards,
Michael

Not applicable
Author

Thanks for the replies. Poor example on my part I think as both could be suitable for the case I gave, but are not most suitable for some other cases.

To be more specific (loosely based on a recent example):

I have a fact table for sales, and two dimension tables for customer and merchant respectively. A merchant can transition from state A to state B (only once), and I want to run some analyses based on what state the merchant was in when the customer registered.

So although the sales date doesn't come into this at all, the resulting calculation is best placed in the fact table as I'd otherwise have to have one very long dimension table for every combination of customer and merchant, which could end up being larger than even the fact table. I also don't want to use a calculated dimension either as it limits what I can do.

fact_sales:

Customer_ID     |     Merchant_ID     |     Sales_Value

001                    |       01               |     $123

001                    |       02                 |     $456

002                    |       02               |     $789

...             

dim_customer:

Customer_ID       |     Registration_Date

001                     |     01/04/2013

002                    |      01/08/2013

...

dim_merchant:

Merchant_ID     |      Transition_Date

01                    |     01/02/2013

02                    |    01/05/2013

...

I then want to use the logic below to produce a field in the fact_sales table called Merchant_Status_At_Customer_Reg

     if(dim_customer.Registration_Date < dim_merchant.Transition_Date,

          'State A',

          'State B'

     )

So the fact_sales table would become:

fact_sales:

Customer_ID     |     Merchant_ID     |     Sales_Value     |    Merchant_Status_At_Customer_Reg

001                    |       01               |     $123                |      State B (had transitioned)

001                    |       02                 |     $456               |     State A (had not transitioned)

002                    |       02               |     $789               |       State B (had transitioned)

...

Nicole-Smith

Load script like the following should do the trick (see the attached for it in action):

fact_sales:

LOAD * INLINE [

Customer_ID,Merchant_ID,Sales_Value

001,01,$123

001,02,$456

002,02,$789

];

dim_customer:

LOAD * INLINE [

Customer_ID,Registration_Date

001,01/04/2013

002,01/08/2013

];

dim_merchant:

LOAD * INLINE [

Merchant_ID,Transition_Date

01,01/02/2013

02,01/05/2013

];

LEFT JOIN (fact_sales)

LOAD Merchant_ID,

    Customer_ID,

    if(Lookup('Registration_Date','Customer_ID',Customer_ID,'dim_customer') <

    Lookup('Transition_Date','Merchant_ID',Merchant_ID,'dim_merchant'), 'State A', 'State B') as Merchant_Status_At_Customer_Reg

RESIDENT fact_sales;

Anonymous
Not applicable
Author

So, you want to bring both dates into the fact table, calculate the new field, and drop the dates from the fact?  Looks rather reasonable.  You can mark your own answer as "Correct" 

If you need a little help with syntax:

LEFT JOIN (fact_sales) LOAD DISTINCT

Customer_ID,

Registration_Date

RESIDENT dim_customer;

LEFT JOIN (fact_sales) LOAD DISTINCT

Mercant_ID,

Transition_Date

RESIDENT dim_merchant;

LEFT JOIN (fact_sales) LOAD DISTINCT

Customer_ID,

Mercant_ID,

if(Registration_Date < Transition_Date, 'State A', 'State B') as State

RESIDENT fact_sales;

DROP FIELDS Registration_Date, Transition_Date FROM fact_sales;

Regards,

Michael

Not applicable
Author

Yes was after the best syntax, those should both do it thanks.

It's a bit more cumbersome than SQL which would basically add one column, then update the new column with a join - without the need to bring in multiple columns and then drop. It's not much more though.