Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am wondering if there's a quick way to add a column to a table which is based on two (or more) tables, after the two tables are loaded. I have tried various methods but they involve creating temporary tables, renaming, dropping etc.
For example if I have loaded two tables -
T1:
Name | Value_New
A 5
B 6
C 7
T2:
Name | Value_Old
A 8
B 2
C 1
I then want to add a column to T1 defined as Difference = T1.Value_New - T2.Value_Old,
Simple in theory but I can't find a neat way to do it. A join statement doesn't seem to work since if I load from T2 as resident, it doesn't recognise the T1 field.
Nor do I want to simply load T2 first and then do it as part of the T1 load statement, since I need a general solution for more complex cases.
Mike
How about doing a Mapping Load from T2 [make sure it is optimised] and then using ApplyMap within calculating your new derived filed when loading from T1.
You would still have to a load from T2, but I cannot see any of doing it without some kind of load from T2.
Best Regards, Bill
Mike,
It would be natural to join T1 and T2 into one table. I don't see why it is a problem:
T1:
...;
JOIN (T1) LOAD
Name, // assuming there is field Name in T1
Value_Old
RESIDENT T2;
DROP TABLE T2;
Regards,
Michael
Thanks for the replies. Poor example on my part I think as both could be suitable for the case I gave, but are not most suitable for some other cases.
To be more specific (loosely based on a recent example):
I have a fact table for sales, and two dimension tables for customer and merchant respectively. A merchant can transition from state A to state B (only once), and I want to run some analyses based on what state the merchant was in when the customer registered.
So although the sales date doesn't come into this at all, the resulting calculation is best placed in the fact table as I'd otherwise have to have one very long dimension table for every combination of customer and merchant, which could end up being larger than even the fact table. I also don't want to use a calculated dimension either as it limits what I can do.
fact_sales:
Customer_ID | Merchant_ID | Sales_Value
001 | 01 | $123
001 | 02 | $456
002 | 02 | $789
...
dim_customer:
Customer_ID | Registration_Date
001 | 01/04/2013
002 | 01/08/2013
...
dim_merchant:
Merchant_ID | Transition_Date
01 | 01/02/2013
02 | 01/05/2013
...
I then want to use the logic below to produce a field in the fact_sales table called Merchant_Status_At_Customer_Reg
if(dim_customer.Registration_Date < dim_merchant.Transition_Date,
'State A',
'State B'
)
So the fact_sales table would become:
fact_sales:
Customer_ID | Merchant_ID | Sales_Value | Merchant_Status_At_Customer_Reg
001 | 01 | $123 | State B (had transitioned)
001 | 02 | $456 | State A (had not transitioned)
002 | 02 | $789 | State B (had transitioned)
...
Load script like the following should do the trick (see the attached for it in action):
fact_sales:
LOAD * INLINE [
Customer_ID,Merchant_ID,Sales_Value
001,01,$123
001,02,$456
002,02,$789
];
dim_customer:
LOAD * INLINE [
Customer_ID,Registration_Date
001,01/04/2013
002,01/08/2013
];
dim_merchant:
LOAD * INLINE [
Merchant_ID,Transition_Date
01,01/02/2013
02,01/05/2013
];
LEFT JOIN (fact_sales)
LOAD Merchant_ID,
Customer_ID,
if(Lookup('Registration_Date','Customer_ID',Customer_ID,'dim_customer') <
Lookup('Transition_Date','Merchant_ID',Merchant_ID,'dim_merchant'), 'State A', 'State B') as Merchant_Status_At_Customer_Reg
RESIDENT fact_sales;
So, you want to bring both dates into the fact table, calculate the new field, and drop the dates from the fact? Looks rather reasonable. You can mark your own answer as "Correct"
If you need a little help with syntax:
LEFT JOIN (fact_sales) LOAD DISTINCT
Customer_ID,
Registration_Date
RESIDENT dim_customer;
LEFT JOIN (fact_sales) LOAD DISTINCT
Mercant_ID,
Transition_Date
RESIDENT dim_merchant;
LEFT JOIN (fact_sales) LOAD DISTINCT
Customer_ID,
Mercant_ID,
if(Registration_Date < Transition_Date, 'State A', 'State B') as State
RESIDENT fact_sales;
DROP FIELDS Registration_Date, Transition_Date FROM fact_sales;
Regards,
Michael
Yes was after the best syntax, those should both do it thanks.
It's a bit more cumbersome than SQL which would basically add one column, then update the new column with a join - without the need to bring in multiple columns and then drop. It's not much more though.