Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello: I have a data set that shows up like this:
teamNAME | gameDATE | timeOUT | tixSOLDPERDAYX |
vs. Team L | 8/15/2014 0:00 | 66% | 4 |
vs. Team L | 8/15/2014 0:00 | 67% | 12 |
vs. Team L | 8/15/2014 0:00 | 68% | 2 |
vs. Team L | 8/15/2014 0:00 | 69% | 0 |
vs. Team L | 8/15/2014 0:00 | 69% | 2 |
vs. Team L | 8/15/2014 0:00 | 70% | 0 |
vs. Team L | 8/15/2014 0:00 | 71% | 2 |
vs. Team L | 8/15/2014 0:00 | 72% | 15 |
vs. Team L | 8/15/2014 0:00 | 73% | 0 |
i think use this transformation:
HISTORICALTICKETING:
LOAD
timeOUT,
teamNAME,
gameDATE,
sum(tixSOLDPERDAYX) as perDayCOUNT,
IF(timeOUT=0, SUM(tixSOLDPERDAYX), RangeSum(SUM(tixSOLDPERDAYX), Peek('RunningTotal'))) as RunningTotal
RESIDENT RECONFIGURED_TICKETING4
Group by timeOUT, teamNAME, gameDATE;
DROP TABLE RECONFIGURED_TICKETING4;
I was trying to utilize the group by function to eliminate the duplicate values (like 69% above). however the output still shows two 69% values. This is problematic because i am showing an average of Team L RunningTotal, which is double counting these values.
Is there something wrong with my group By? Or is there a way to write something to eliminate the duplicate values?
Edited to conver text format back to %
RECONFIGURED_TICKETING4:
LOAD * INLINE [
teamNAME, gameDATE, timeOUT, tixSOLDPERDAYX
vs. Team L, 8/15/2014 0:00, 66%, 4
vs. Team L, 8/15/2014 0:00, 67%, 12
vs. Team L, 8/15/2014 0:00, 68%, 2
vs. Team L, 8/15/2014 0:00, 69%, 0
vs. Team L, 8/15/2014 0:00, 69%, 2
vs. Team L, 8/15/2014 0:00, 70%, 0
vs. Team L, 8/15/2014 0:00, 71%, 2
vs. Team L, 8/15/2014 0:00, 72%, 15
vs. Team L, 8/15/2014 0:00, 73%, 0
];
HISTORICALTICKETING_TEMP:
LOAD
text(timeOUT) as timeOUT,
teamNAME,
gameDATE,
sum(tixSOLDPERDAYX) as perDayCOUNT
//IF(timeOUT=0, SUM(tixSOLDPERDAYX), RangeSum(SUM(tixSOLDPERDAYX), Peek('RunningTotal'))) as RunningTotal
RESIDENT RECONFIGURED_TICKETING4
Group by text(timeOUT), teamNAME, gameDATE;
DROP TABLE RECONFIGURED_TICKETING4;
HISTORICALTICKETING:
NoConcatenate
LOAD
num#(timeOUT,'#,##0%') as timeOUT,
teamNAME,
gameDATE,
perDayCOUNT
//IF(timeOUT=0, SUM(tixSOLDPERDAYX), RangeSum(SUM(tixSOLDPERDAYX), Peek('RunningTotal'))) as RunningTotal
RESIDENT HISTORICALTICKETING_TEMP;
drop Table HISTORICALTICKETING_TEMP;
Here your timeOUT is never 0 as they all have %ges.
HISTORICALTICKETING:
LOAD
timeOUT,
teamNAME,
gameDATE,
sum(tixSOLDPERDAYX) as perDayCOUNT,
IF(timeOUT=0, SUM(tixSOLDPERDAYX), RangeSum(SUM(tixSOLDPERDAYX), Peek('RunningTotal'))) as RunningTotal
RESIDENT RECONFIGURED_TICKETING4
Group by timeOUT, teamNAME, gameDATE;
DROP TABLE RECONFIGURED_TICKETING4;
May be this you are looking for?
LOAD
timeOUT,
teamNAME,
gameDATE,
sum(tixSOLDPERDAYX) as perDayCOUNT,
IF(tixSOLDPERDAYX =0, SUM(tixSOLDPERDAYX), RangeSum(SUM(tixSOLDPERDAYX), Peek('RunningTotal'))) as RunningTotal
that throws me an error when trying to reload
Try this?
RECONFIGURED_TICKETING4:
LOAD * INLINE [
teamNAME, gameDATE, timeOUT, tixSOLDPERDAYX
vs. Team L, 8/15/2014 0:00, 66%, 4
vs. Team L, 8/15/2014 0:00, 67%, 12
vs. Team L, 8/15/2014 0:00, 68%, 2
vs. Team L, 8/15/2014 0:00, 69%, 0
vs. Team L, 8/15/2014 0:00, 69%, 2
vs. Team L, 8/15/2014 0:00, 70%, 0
vs. Team L, 8/15/2014 0:00, 71%, 2
vs. Team L, 8/15/2014 0:00, 72%, 15
vs. Team L, 8/15/2014 0:00, 73%, 0
];
NoConcatenate
History:
LOAD
timeOUT,
teamNAME,
gameDATE,
sum(tixSOLDPERDAYX) as perDayCOUNT,
IF(tixSOLDPERDAYX = 0, RangeSum(SUM(tixSOLDPERDAYX), Peek('RunningTotal')), SUM(tixSOLDPERDAYX)) as RunningTotal
RESIDENT RECONFIGURED_TICKETING4
Group by timeOUT, teamNAME, gameDATE, tixSOLDPERDAYX;
DROP TABLE RECONFIGURED_TICKETING4;
What is the output you are expecting?
I want the same output that i have, just only one value for each timeOUT- like this:
teamNAME | gameDATE | timeOUT | tixSOLDPERDAYX |
vs. Team L | 8/15/2014 0:00 | 66% | 4 |
vs. Team L | 8/15/2014 0:00 | 67% | 12 |
vs. Team L | 8/15/2014 0:00 | 68% | 2 |
vs. Team L | 8/15/2014 0:00 | 69% | 2 |
vs. Team L | 8/15/2014 0:00 | 70% | 0 |
vs. Team L | 8/15/2014 0:00 | 71% | 2 |
vs. Team L | 8/15/2014 0:00 | 72% | 15 |
vs. Team L | 8/15/2014 0:00 | 73% | 0 |
vs Team L | 8/15/2014 0:00 | 74% | 5 |
Try below
HISTORICALTICKETING:
LOAD
timeOUT,
teamNAME,
gameDATE,
sum(tixSOLDPERDAYX) as perDayCOUNT
RESIDENT RECONFIGURED_TICKETING4
Group by timeOUT, teamNAME, gameDATE;
i still get the same result
Did you try Vineeth's script? You should get only one timeOUT value 69%.
It still loads the exact same. I also am not sure how that script is any different from what i had below, with the exception of the running total