Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all!
I have a table like:
Load * inline
[
RiskID, RiskSeverity, RiskProbability, otherStuff
1, high, low, abc
2, low, average, abcddd
3, average, high, adesds
4, low, high, adefsd
3, high, low, abcewew
2, high, high, aderwc
];
etc
bad quality input data
i.d. not distinct RiskID, RiskID has several Severity and Probability values
I need to create a new table and load into this table:
1. distinct RiskID
2. the highest Probability (among all probabilities available for this RiskID) there are only 3 types (low. average, high)
3. the Highest Severity - same as p. 2
4. otherStuff (only 1 version exists for each RiskID, no duplicates here) - that are all other fields in the table
So the idea is to clean up table from bad data in columns RiskSeverity and RiskProbability - need to take the highest only and make RiskID distinct.
Thanks!
Hi eeacrw,
I have a method that is a little similar with Marcus'.
First you need to left join twice.
Data:
Load * inline
[
RiskID, RiskSeverity, RiskProbability, otherStuff
1, high, low, abc
2, low, average, abcddd
3, average, high, adesds
4, low, high, adefsd
3, high, low, adesds
2, high, high, abcddd
];
left join
Load * inline
[
RiskSeverity, SeverityNum
high, 3
average, 2
low, 1
];
left join
Load * inline
[
RiskProbability, ProbabilityNum
high, 3
average, 2
low, 1
];
Final:
noconcatenate
Load distinct RiskID,
otherStuff,
if(max(SeverityNum) = 3,'high', if(max(SeverityNum) = 2,'average','low')) as RiskSeverity,
if(max(ProbabilityNum) = 3,'high', if(max(ProbabilityNum) = 2,'average','low')) as RiskProbability
resident Data
group by RiskID,otherStuff;
drop table Data;
Thanks.
Aiolos Zhao
What is your expected output based on your Inline Load should be ?
One way might be to add an additional numeric field for your risks or probably better to use dual-values for it like:
dual('low', 1)
dual('average', 2)
an then you could just make an aggregation-query like:
// optional: inner join (YourLoadedTable)
load RiskID, max(RiskProbability) as MaxRiskProbability resident YourLoadedTable;
- Marcus
The result should look like:
Load * inline
[
RiskID, RiskSeverity, RiskProbability, otherStuff
1, high, low, abc
2, high, high, abcddd
3, high, high, adesds
4, low, high, adefsd
];
Sorry, I've shown not unique data in otherStuff table... actually it is unique for each RiskID
the correct source data is:
Load * inline
[
RiskID, RiskSeverity, RiskProbability, otherStuff
1, high, low, abc
2, low, average, abcddd
3, average, high, adesds
4, low, high, adefsd
3, high, low, adesds
2, high, high, abcddd
];
etc
Hi eeacrw,
I have a method that is a little similar with Marcus'.
First you need to left join twice.
Data:
Load * inline
[
RiskID, RiskSeverity, RiskProbability, otherStuff
1, high, low, abc
2, low, average, abcddd
3, average, high, adesds
4, low, high, adefsd
3, high, low, adesds
2, high, high, abcddd
];
left join
Load * inline
[
RiskSeverity, SeverityNum
high, 3
average, 2
low, 1
];
left join
Load * inline
[
RiskProbability, ProbabilityNum
high, 3
average, 2
low, 1
];
Final:
noconcatenate
Load distinct RiskID,
otherStuff,
if(max(SeverityNum) = 3,'high', if(max(SeverityNum) = 2,'average','low')) as RiskSeverity,
if(max(ProbabilityNum) = 3,'high', if(max(ProbabilityNum) = 2,'average','low')) as RiskProbability
resident Data
group by RiskID,otherStuff;
drop table Data;
Thanks.
Aiolos Zhao