Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello-
I have a data set with multiple event dates (DATE) and the percent of time to the event (timeOUT). I am looking to ensure that there is always a 0% value for this, however within the data, this is not always the case (see below). I have truncated the data below, however you will see that 4/14 data has a 0% to 100%, however 4/19 only has a 1%.
DATE | timeOUT | SALES |
4/14/2013 0:00 | 0% | 15964 |
4/14/2013 0:00 | 1% | 2 |
4/14/2013 0:00 | 95% | 6 |
4/14/2013 0:00 | 96% | 8 |
4/14/2013 0:00 | 97% | 14 |
4/14/2013 0:00 | 98% | 16 |
4/14/2013 0:00 | 99% | 26 |
4/14/2013 0:00 | 100% | 90 |
4/19/2013 0:00 | 1% | 2 |
4/19/2013 0:00 | 2% | 2 |
4/19/2013 0:00 | 3% | 0 |
4/19/2013 0:00 | 5% | 4 |
Please help!
The magic is in the usage of the DISTICT keyword. In Gysbert's example, the DISTINCT keyword forces the LOAD to only read distinct DATE values. In your new code with multiple different fields, the DISTINCT keyword will select all unique combinations of DATE, NAME, totalPRESOLD and DAYTOTAL, which may lead to multiple zero percentage lines for individual DATE values.
You can avoid this happening by setting NAME, totalPRESOLD and DAYTOTAL also to fixed values, like in:
CONCATENATE (RECONFIGURED_TICKETING2)
LOAD DISTINCT
'' AS NAME,
DATE,
0 as SALES,
'0%' as timeOUT,
0 AS totalPRESOLD,
0 AS DAYTOTAL
RESIDENT RECONFIGURED_TICKETING1
WHERE NOT Exists(LookUpField, gameDATE & '|' & '0%');
Remember that these new zero rows contain imaginary data, and therefor don't need the original values.
Best,
Peter
What is your requirement here? What is that you want to see your expected output? Can you explain a little more please?
Sure: My expected output would be to have the same data set as above, but have a value for 4/19/2013 where timeOUT is 0% (even though this doesn't currently exist in the data set as it was loaded in)
Perhaps like this:
MyTable:
LOAD
DATE,
timeOUT,
SALES,
DATE & '|' & timeOUT as LookUpField
FROM
...sourcetable...
;
CONCATENATE (MyTable)
LOAD DISTINCT
DATE,
0 as timeOUT,
0 as SALES
FROM
...sourcetable...
WHERE
Not Exists(LookUpField, DATE & '|' & 0)
;
DROP FIELD LookUpField;
the duplicate 0% continue to show up in the new data set. It creates 0 values, but for all the data and then duplicates the sales when they are grouped. I thought the NOT EXISTS would make it so only values without a 0 would show up with these new values.
By applying Gysbert's example code, you now get multiple zero rows for individual dates? We alrady have some example data. Can you post the script you're currently using?
there are some additional fields that i had not previously mentioned, however this is the full script i'm using for this part:
RECONFIGURED_TICKETING2:
LOAD
NAME,
DATE,
transDATE,
NUM(dayAsPercent,'#,###%') AS timeOUT,
DAYTOTAL,
totalPRESOLD,
SALES,
DATE & '|' & dayAsPercent as LookUpField
RESIDENT RECONFIGURED_TICKETING1;
Concatenate (RECONFIGURED_TICKETING2)
LOAD Distinct
NAME,
DATE,
'0' as SALES,
'0%' as timeOUT,
totalPRESOLD,
DAYTOTAL
Resident
RECONFIGURED_TICKETING1
WHERE NOT Exists(LookUpField, gameDATE & '|' & '0%');
DROP Field LookUpField;
DROP TABLE RECONFIGURED_TICKETING1;
Or even better: this code
MyTable:
LOAD *, DATE & '|' & timeOUT AS LookUpField INLINE [
DATE, timeOUT, SALES
4/14/2013 0:00, 0, 15964
4/14/2013 0:00, 1, 2
4/14/2013 0:00, 95, 6
4/14/2013 0:00, 96, 8
4/14/2013 0:00, 97, 14
4/14/2013 0:00, 98, 16
4/14/2013 0:00, 99, 26
4/14/2013 0:00, 100, 90
4/19/2013 0:00, 1, 2
4/19/2013 0:00, 2, 2
4/19/2013 0:00, 3, 0
4/19/2013 0:00, 5, 4
];
CONCATENATE (MyTable)
LOAD DISTINCT
DATE,
0 AS timeOUT,
0 AS SALES
RESIDENT MyTable
WHERE Not Exists(LookUpField, DATE & '|0');
DROP Field LookUpField;
produces a table with exactly 13 rows (as expected) that looks like this:
What is it that doesn't match your requirements?
The magic is in the usage of the DISTICT keyword. In Gysbert's example, the DISTINCT keyword forces the LOAD to only read distinct DATE values. In your new code with multiple different fields, the DISTINCT keyword will select all unique combinations of DATE, NAME, totalPRESOLD and DAYTOTAL, which may lead to multiple zero percentage lines for individual DATE values.
You can avoid this happening by setting NAME, totalPRESOLD and DAYTOTAL also to fixed values, like in:
CONCATENATE (RECONFIGURED_TICKETING2)
LOAD DISTINCT
'' AS NAME,
DATE,
0 as SALES,
'0%' as timeOUT,
0 AS totalPRESOLD,
0 AS DAYTOTAL
RESIDENT RECONFIGURED_TICKETING1
WHERE NOT Exists(LookUpField, gameDATE & '|' & '0%');
Remember that these new zero rows contain imaginary data, and therefor don't need the original values.
Best,
Peter