Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
qluser01
Creator
Creator

Remove duplicates from table

Hi all!

I have a table like:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, low, average, abcddd

3, average, high, adesds

4, low, high, adefsd

3, high, low, abcewew

2, high, high, aderwc

];

etc

bad quality input data

i.d. not distinct RiskID,  RiskID has several Severity and Probability values

I need to create a new table and load into this table:

1. distinct RiskID

2. the highest Probability (among all probabilities available for this RiskID) there are only 3 types (low. average, high)

3. the Highest Severity - same as p. 2

4. otherStuff (only 1 version exists for each RiskID, no duplicates here) - that are all other fields in the table

So the idea is to clean up table from bad data in columns RiskSeverity and RiskProbability - need to take the highest only and make RiskID distinct.

Thanks!

1 Solution

Accepted Solutions
uacg0009
Partner - Specialist
Partner - Specialist

Hi eeacrw,

I have a method that is a little similar with Marcus'.

First you need to left join twice.

Data:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, low, average, abcddd

3, average, high, adesds

4, low, high, adefsd

3, high, low, adesds

2, high, high, abcddd

];

left join

Load * inline

[

RiskSeverity, SeverityNum

high, 3

average, 2

low, 1

];

left join

Load * inline

[

RiskProbability, ProbabilityNum

high, 3

average, 2

low, 1

];

Final:

noconcatenate

Load distinct RiskID,

                    otherStuff,

                    if(max(SeverityNum) = 3,'high', if(max(SeverityNum) = 2,'average','low')) as RiskSeverity,

                    if(max(ProbabilityNum) = 3,'high', if(max(ProbabilityNum) = 2,'average','low')) as RiskProbability

resident Data

group by RiskID,otherStuff;


drop table Data;


Thanks.


Aiolos Zhao

View solution in original post

4 Replies
vishsaggi
Champion III
Champion III

What is your expected output based on your Inline Load should be ?

marcus_sommer

One way might be to add an additional numeric field for your risks or probably better to use dual-values for it like:

dual('low', 1)

dual('average', 2)

an then you could just make an aggregation-query like:

// optional: inner join (YourLoadedTable)

load RiskID, max(RiskProbability) as MaxRiskProbability resident YourLoadedTable;

- Marcus

qluser01
Creator
Creator
Author

The result should look like:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, high, high, abcddd

3, high, high, adesds

4, low, high, adefsd

];

Sorry, I've shown not unique data in otherStuff table... actually it is unique for each RiskID

the correct source data is:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, low, average, abcddd

3, average, high, adesds

4, low, high, adefsd

3, high, low, adesds

2, high, high, abcddd

];

etc

uacg0009
Partner - Specialist
Partner - Specialist

Hi eeacrw,

I have a method that is a little similar with Marcus'.

First you need to left join twice.

Data:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, low, average, abcddd

3, average, high, adesds

4, low, high, adefsd

3, high, low, adesds

2, high, high, abcddd

];

left join

Load * inline

[

RiskSeverity, SeverityNum

high, 3

average, 2

low, 1

];

left join

Load * inline

[

RiskProbability, ProbabilityNum

high, 3

average, 2

low, 1

];

Final:

noconcatenate

Load distinct RiskID,

                    otherStuff,

                    if(max(SeverityNum) = 3,'high', if(max(SeverityNum) = 2,'average','low')) as RiskSeverity,

                    if(max(ProbabilityNum) = 3,'high', if(max(ProbabilityNum) = 2,'average','low')) as RiskProbability

resident Data

group by RiskID,otherStuff;


drop table Data;


Thanks.


Aiolos Zhao