Solved: Re: Counting distinct unique values based on value... - Qlik Community

Intector · ‎2020-08-11

Hello,

I'm new to Qlik sense and trying to figure out something quite simple. My real database has many thousand rows with data which I need to analyze. My problem is giving me some sleepless nights and now I'm trying to find an answer here.

I have some data and would like to get some statistical numbers from them, nothing fancy just stuff like this:

How many DISTINCT values - without duplicates - are in a particular column depending on the value of a different column.
How many DISTINCT duplicates are in a particular column depending on the value of a different column.
How many duplicates does every DISTINCT duplicate have depending on the value of a different column.

I tried already different approaches with different loading scripts, but unfortunately nothing works the way I like to have it.

Here is an example to explain it a little better:

The test data:

[Test_Data]

LOAD * Inline [
ID, TXT, NUM
0, TXT1, 100001
1, TXT1, 100000
2, TXT1, 100000
3, TXT1, 100000
4, TXT1, 100000
5, TXT1, 100002
6, TXT2, 100001
7, TXT2, 100002
8, TXT2, 100002
9, TXT2, 100000
10, TXT3, 100001
11, TXT3, 100002
12, TXT3, 100005
13, TXT3, 100013
14, TXT3, 100004
15, TXT3, 100004
16, TXT3, 100010
16, TXT3, 100010]

The answers I’m looking for are:

values appearing only once:

TXT1 - count: 2

100001
100002

TXT2 – count: 2

100001
100000

TXT3 – count: 4

100001
100002
100005
100013

values appearing more than once:

TXT1 count: 1

100000

TXT2 count: 1

100002

TXT3 count: 2

100004
100010

duplicate appearing:

TXT1

100000 – count: 4

TXT2

100002 – count: 2

TXT3

100004 – count: 2
100010 – count: 2

I tried this:

For values appearing only once:

Count(DISTINCT{<NUM = {"=Count(NUM) = 1"}>}NUM)

Does not work because it gets the values independent from TXT

Sum(Aggr(Count(DISTINCT{<NUM = {"=Count(NUM) = 1"}>}NUM), TXT)

Doesn’t work, gives me strange results

For values appearing more than once:

Count(DISTINCT{<NUM = {"=Count(NUM) > 1"}>}NUM)

Does not work because it gets the values independent from TXT

Sum(Aggr(Count(DISTINCT{<NUM = {"=Count(NUM) > 1"}>}NUM), TXT)

Doesn’t work, gives me strange results

For duplicate:

Count({<NUM = {"=Count(NUM) > 1"}>}NUM)

Does not work because it gets the values independent from TXT

Sum(Aggr(Count({<NUM = {"=Count(NUM) > 1"}>}NUM), TXT)

Doesn’t work, gives me strange results

tm_burgers · ‎2020-08-11

I would recommend adding the count of values to the data table itself, In this case since you are looking for duplicates across two dimensions; I would concat the two fields into a keyfield.

For example:

[Test_Data]:

LOAD * Inline [
ID, TXT, NUM
0, TXT1, 100001
1, TXT1, 100000
2, TXT1, 100000
3, TXT1, 100000
4, TXT1, 100000
5, TXT1, 100002
6, TXT2, 100001
7, TXT2, 100002
8, TXT2, 100002
9, TXT2, 100000
10, TXT3, 100001
11, TXT3, 100002
12, TXT3, 100005
13, TXT3, 100013
14, TXT3, 100004
15, TXT3, 100004
16, TXT3, 100010
16, TXT3, 100010]

LEFT Join (TEST_DATA)

LOAD *,

TXT&'-'&NUM as DuplicateIdentifier

Resident TEST_DATA;

LEFT Join (TEST_DATA)
LOAD

DuplicateIdentifier,

count(DuplicateIdentifier) as CountOccurrence

Resident TEST_DATA

Group by DuplicateIdentifier;

Then you can use SetAnalysis to search for CountOccurrence = {">1"} etc.

View solution in original post

Kushal_Chawda · ‎2020-08-11

try below expressions

//Distinct Values
=count(aggr(if(Count(NUM)=1,NUM),TXT,NUM))

// Duplicate Values
=count(aggr(if(Count(NUM)>1,NUM),TXT,NUM))

//Duplicate Value Counts
=Concat(aggr(if(Count(NUM)>1,NUM&' : '&count(NUM)),TXT,NUM),',')

View solution in original post

tm_burgers · ‎2020-08-11

I would recommend adding the count of values to the data table itself, In this case since you are looking for duplicates across two dimensions; I would concat the two fields into a keyfield.

For example:

[Test_Data]:

LOAD * Inline [
ID, TXT, NUM
0, TXT1, 100001
1, TXT1, 100000
2, TXT1, 100000
3, TXT1, 100000
4, TXT1, 100000
5, TXT1, 100002
6, TXT2, 100001
7, TXT2, 100002
8, TXT2, 100002
9, TXT2, 100000
10, TXT3, 100001
11, TXT3, 100002
12, TXT3, 100005
13, TXT3, 100013
14, TXT3, 100004
15, TXT3, 100004
16, TXT3, 100010
16, TXT3, 100010]

LEFT Join (TEST_DATA)

LOAD *,

TXT&'-'&NUM as DuplicateIdentifier

Resident TEST_DATA;

LEFT Join (TEST_DATA)
LOAD

DuplicateIdentifier,

count(DuplicateIdentifier) as CountOccurrence

Resident TEST_DATA

Group by DuplicateIdentifier;

Then you can use SetAnalysis to search for CountOccurrence = {">1"} etc.

Kushal_Chawda · ‎2020-08-11

try below expressions

//Distinct Values
=count(aggr(if(Count(NUM)=1,NUM),TXT,NUM))

// Duplicate Values
=count(aggr(if(Count(NUM)>1,NUM),TXT,NUM))

//Duplicate Value Counts
=Concat(aggr(if(Count(NUM)>1,NUM&' : '&count(NUM)),TXT,NUM),',')

Intector · ‎2020-08-11

TM_burgers, tank you for the fast answer.

Your solution is what I was looking for and also very useful if one need the count values in the table for later usage.

Intector · ‎2020-08-11

Kush, thank you for the fast answer,

Your answer is exactly what I was looking for. I didn't quite understood the "AGGR" function until now. Well, I guess there's a lot left for me to learn with this Qlik Sense, but it's very exiting stuff.

Counting distinct unique values based on value in other fields

counting

Distinct

unique