Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm trying to figure out if this is even possible. I haven't had any luck on my own using the following routine and any similar. I'm trying to apply logic on a resident table and make field values based on the initial logic. I realize the following is probably not syntactically correct. Assume the variables are valued values.
Anything like this possible without loading the resident table 3x to get the desired result?
Summary_Data:
NoConcatenate
LOAD
If
Country = 'US'
and Date >= '$(vStartOneDate)'
and Date <= '$(vTableOneDate)'
Then
Product,
App,
Country,
'One' as Source_Table
Else If
Country = 'US'
and Date >= '$(vStartTwoDate)'
and Date <= '$(vTableTwoDate)'
Then
Product,
App,
Country,
'Two' as Source_Table
Else If
Country = 'US'
and Date >= '$(vStartThreeDate)'
and Date <= '$(vTableThreeDate)'
Then
Product,
App,
Country,
'Three' as Source_Table
End If
Resident Detailed_Data;
I believe no, You have to load 3 times for that
Hi,
Try below.
If
Country = 'US'
and Date >= '$(vStartOneDate)'
and Date <= '$(vTableOneDate)'
Then
LOAD Product,
App,
Country,
'One' as Source_Table
Resident Detailed_Data;
Else If
Country = 'US'
and Date >= '$(vStartTwoDate)'
and Date <= '$(vTableTwoDate)'
Then Load
Product,
App,
Country,
'Two' as Source_Table
Resident Detailed_Data;
Else If
Country = 'US'
and Date >= '$(vStartThreeDate)'
and Date <= '$(vTableThreeDate)'
Then Load
Product,
App,
Country,
'Three' as Source_Table
Resident Detailed_Data;
End If
Thanks!
Hi John,
Since you are updating the Source_Table field with three different values, you have to load the resident table thrice while concatenating all three.
Thanks!
May be this
Summary_Data:
NoConcatenate
LOAD
Product,
App,
Country,
If(Country = 'US' and Date >= '$(vStartOneDate)' and Date <= '$(vTableOneDate)', 'One',
If(Country = 'US' and Date >= '$(vStartTwoDate)' and Date <= '$(vTableTwoDate)', 'Two',
If(Country = 'US' and Date >= '$(vStartThreeDate)' and Date <= '$(vTableThreeDate)', 'Three'))) as Source_Table
Resident Detailed_Data;
I think I mis-wrote my sample code. I also needed to add counts of things based on the date ranges, so would need to add a group by. The entire effort was to try and speed up the load by reading the resident table once. Based on my initial sample code I believe Sunny had the right idea, but with the embedded if statements on several fields probably slows it down. Will try any way. The actual resident table has about 30 days of ~500,000 rows, or 15M rows. It now takes 18 minutes per load for each (3) type so i all about 55 minutes.
Can you show the script you are running right now?
Hi Sunny, I think I can show it in it's raw form. Nothing confidential. Each residential load (3x) take 15-18 minutes) and the second 2 concatenate to the first. THANKS for taking a look...John
//US NMC data
TRACE Generating US NMC summary data...;
Summary_Data:
NoConcatenate
LOAD
'$(vUS_NMC_TableDate)' as MS_Date,
'$(vUS_NMC_SummaryType)' as MS_Summary_Type,
UA as MS_UA,
App as MS_App,
Country as MS_Country,
Product as MS_Product,
NMC_Acct_Num as MS_NMC_Acct_Num,
Msg_Type as MS_Msg_Type,
Transport as MS_Transport,
Count(distinct UA) as MS_Units,
Sum(Transactions) as MS_Transactions,
Sum(Bytes) as MS_Bytes,
Count(distinct if(SAT_Trans > 0, UA)) as MS_SAT_Units,
Sum(SAT_Trans) as MS_SAT_Trans,
Sum(SAT_Bytes) as MS_SAT_Bytes,
Count(distinct if(OVT_Trans > 0, UA)) as MS_OVT_Units,
Sum(OVT_Trans) as MS_OVT_Trans,
Sum(OVT_Bytes) as MS_OVT_Bytes,
Count(distinct if(WIFI_Trans > 0, UA)) as MS_WIFI_Units,
Sum(WIFI_Trans) as MS_WIFI_Trans,
Sum(WIFI_Bytes) as MS_WIFI_Bytes
Resident Detailed_Data
Where Country = 'US'
and Date >= '$(vUS_NMC_SummaryStartDate)'
and Date <= '$(vUS_NMC_TableDate)'
//NMC App values are numeric
and IsNum(App)
Group by UA, App, Country, Product, NMC_Acct_Num, Msg_Type, Transport;
//US NGB data
TRACE Generating US NGB summary data...;
NGB_Summary_Data:
Concatenate (Summary_Data)
LOAD
'$(vUS_NGB_TableDate)' as MS_Date,
'$(vUS_NGB_SummaryType)' as MS_Summary_Type,
UA as MS_UA,
App as MS_App,
Country as MS_Country,
Product as MS_Product,
NMC_Acct_Num as MS_NMC_Acct_Num,
Msg_Type as MS_Msg_Type,
Transport as MS_Transport,
Count(distinct UA) as MS_Units,
Sum(Transactions) as MS_Transactions,
Sum(Bytes) as MS_Bytes,
Count(distinct if(SAT_Trans > 0, UA)) as MS_SAT_Units,
Sum(SAT_Trans) as MS_SAT_Trans,
Sum(SAT_Bytes) as MS_SAT_Bytes,
Count(distinct if(OVT_Trans > 0, UA)) as MS_OVT_Units,
Sum(OVT_Trans) as MS_OVT_Trans,
Sum(OVT_Bytes) as MS_OVT_Bytes,
Count(distinct if(WIFI_Trans > 0, UA)) as MS_WIFI_Units,
Sum(WIFI_Trans) as MS_WIFI_Trans,
Sum(WIFI_Bytes) as MS_WIFI_Bytes
Resident Detailed_Data
Where Country = 'US'
and Date >= '$(vUS_NGB_SummaryStartDate)'
and Date <= '$(vUS_NGB_TableDate)'
//NMC App values are text
and IsText(App)
Group by UA, App, Country, Product, NMC_Acct_Num, Msg_Type, Transport;
//CA data
TRACE Generating CA summary data...;
CA_Summary_Data:
Concatenate (Summary_Data)
LOAD
'$(vCATableDate)' as MS_Date,
'$(vCA_SummaryType)' as MS_Summary_Type,
UA as MS_UA,
App as MS_App,
Country as MS_Country,
Product as MS_Product,
NMC_Acct_Num as MS_NMC_Acct_Num,
Msg_Type as MS_Msg_Type,
Transport as MS_Transport,
Count(distinct UA) as MS_Units,
Sum(Transactions) as MS_Transactions,
Sum(Bytes) as MS_Bytes,
Count(distinct if(SAT_Trans > 0, UA)) as MS_SAT_Units,
Sum(SAT_Trans) as MS_SAT_Trans,
Sum(SAT_Bytes) as MS_SAT_Bytes,
Count(distinct if(OVT_Trans > 0, UA)) as MS_OVT_Units,
Sum(OVT_Trans) as MS_OVT_Trans,
Sum(OVT_Bytes) as MS_OVT_Bytes,
Count(distinct if(WIFI_Trans > 0, UA)) as MS_WIFI_Units,
Sum(WIFI_Trans) as MS_WIFI_Trans,
Sum(WIFI_Bytes) as MS_WIFI_Bytes
Resident Detailed_Data
Where Country = 'CA'
and Date >= '$(vCA_SummaryStartDate)'
and Date <= '$(vCATableDate)'
Group by UA, App, Country, Product, NMC_Acct_Num, Msg_Type, Transport;
Looking at this I have two things to say
1) I am not surprised that the load times are high. Aggregations in the script can take a lot of toll and depending on the number or rows you are aggregating on, it can last hours.
2) So, are you looking to load this just one time instead of loading the same table thrice? It is def. possible, but you will have to test if you see any performance gains or not because earlier you were breaking down the aggregations into three parts (lets say 100k, 100k, 100k) whereas the combined will do the same aggregation on 300k rows. Where are you creating your Source_Table in the above script? I did not see that?
This IF structure will not work, you must use the alternative that is used in graphics, IF (<Condition>, True, false)