Solved: Remove duplicates from table - Qlik Community

qluser01 · ‎2017-01-25

Hi all!

I have a table like:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, low, average, abcddd

3, average, high, adesds

4, low, high, adefsd

3, high, low, abcewew

2, high, high, aderwc

];

etc

bad quality input data

i.d. not distinct RiskID, RiskID has several Severity and Probability values

I need to create a new table and load into this table:

1. distinct RiskID

2. the highest Probability (among all probabilities available for this RiskID) there are only 3 types (low. average, high)

3. the Highest Severity - same as p. 2

4. otherStuff (only 1 version exists for each RiskID, no duplicates here) - that are all other fields in the table

So the idea is to clean up table from bad data in columns RiskSeverity and RiskProbability - need to take the highest only and make RiskID distinct.

Thanks!

uacg0009 · ‎2017-01-26

Hi eeacrw,

I have a method that is a little similar with Marcus'.

First you need to left join twice.

Data:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, low, average, abcddd

3, average, high, adesds

4, low, high, adefsd

3, high, low, adesds

2, high, high, abcddd

];

left join

Load * inline

[

RiskSeverity, SeverityNum

high, 3

average, 2

low, 1

];

left join

Load * inline

[

RiskProbability, ProbabilityNum

high, 3

average, 2

low, 1

];

Final:

noconcatenate

Load distinct RiskID,

otherStuff,

if(max(SeverityNum) = 3,'high', if(max(SeverityNum) = 2,'average','low')) as RiskSeverity,

if(max(ProbabilityNum) = 3,'high', if(max(ProbabilityNum) = 2,'average','low')) as RiskProbability

resident Data

group by RiskID,otherStuff;

drop table Data;

Thanks.

Aiolos Zhao

View solution in original post

vishsaggi · ‎2017-01-25

What is your expected output based on your Inline Load should be ?

marcus_sommer · ‎2017-01-25

One way might be to add an additional numeric field for your risks or probably better to use dual-values for it like:

dual('low', 1)

dual('average', 2)

an then you could just make an aggregation-query like:

// optional: inner join (YourLoadedTable)

load RiskID, max(RiskProbability) as MaxRiskProbability resident YourLoadedTable;

- Marcus

qluser01 · ‎2017-01-26

The result should look like:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, high, high, abcddd

3, high, high, adesds

4, low, high, adefsd

];

Sorry, I've shown not unique data in otherStuff table... actually it is unique for each RiskID

the correct source data is:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, low, average, abcddd

3, average, high, adesds

4, low, high, adefsd

3, high, low, adesds

2, high, high, abcddd

];

etc

uacg0009 · ‎2017-01-26

Hi eeacrw,

I have a method that is a little similar with Marcus'.

First you need to left join twice.

Data:

Load * inline

[

RiskID, RiskSeverity, RiskProbability, otherStuff

1, high, low, abc

2, low, average, abcddd

3, average, high, adesds

4, low, high, adefsd

3, high, low, adesds

2, high, high, abcddd

];

left join

Load * inline

[

RiskSeverity, SeverityNum

high, 3

average, 2

low, 1

];

left join

Load * inline

[

RiskProbability, ProbabilityNum

high, 3

average, 2

low, 1

];

Final:

noconcatenate

Load distinct RiskID,

otherStuff,

if(max(SeverityNum) = 3,'high', if(max(SeverityNum) = 2,'average','low')) as RiskSeverity,

if(max(ProbabilityNum) = 3,'high', if(max(ProbabilityNum) = 2,'average','low')) as RiskProbability

resident Data

group by RiskID,otherStuff;

drop table Data;

Thanks.

Aiolos Zhao

Remove duplicates from table

Related Topics