Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Intector
Contributor
Contributor

Counting distinct unique values based on value in other fields

Hello,

I'm new to Qlik sense and trying to figure out something quite simple. My real database has many thousand rows with data which I need to analyze. My problem is giving me some sleepless nights and now I'm trying to find an answer here.

I have some data and would like to get some statistical numbers from them, nothing fancy just stuff like this:

 

  • How many DISTINCT values - without duplicates - are in a particular column depending on the value of a different column.
  • How many DISTINCT duplicates are in a particular column depending on the value of a different column.
  • How many duplicates does every DISTINCT duplicate have depending on the value of a different column.

 

I tried already different approaches with different loading scripts, but unfortunately nothing works the way I like to have it.  

Here is an example to explain it a little better:

 

The test data:

 

[Test_Data]

LOAD * Inline [
ID, TXT, NUM
0, TXT1, 100001
1, TXT1, 100000
2, TXT1, 100000
3, TXT1, 100000
4, TXT1, 100000
5, TXT1, 100002
6, TXT2, 100001
7, TXT2, 100002
8, TXT2, 100002
9, TXT2, 100000
10, TXT3, 100001
11, TXT3, 100002
12, TXT3, 100005
13, TXT3, 100013
14, TXT3, 100004
15, TXT3, 100004
16, TXT3, 100010
16, TXT3, 100010]

 

The answers I’m looking for are:

 

values appearing only once:

  • TXT1 - count: 2
    • 100001
    • 100002

 

  • TXT2 – count: 2
    • 100001
    • 100000

 

  • TXT3 – count: 4
    • 100001
    • 100002
    • 100005
    • 100013

 

values appearing more than once:

  • TXT1 count: 1
    • 100000

 

  • TXT2 count: 1
    • 100002

 

  • TXT3 count: 2
    • 100004
    • 100010

 

duplicate appearing:

  • TXT1
    • 100000 – count: 4

 

  • TXT2
    • 100002 – count: 2

 

  • TXT3
    • 100004 – count: 2
    • 100010 – count: 2

 

I tried this:

 

For values appearing only once:

  • Count(DISTINCT{<NUM = {"=Count(NUM) = 1"}>}NUM)
    • Does not work because it gets the values independent from TXT
  • Sum(Aggr(Count(DISTINCT{<NUM = {"=Count(NUM) = 1"}>}NUM), TXT)
    • Doesn’t work, gives me strange results

 

For values appearing more than once:

  • Count(DISTINCT{<NUM = {"=Count(NUM) > 1"}>}NUM)
    • Does not work because it gets the values independent from TXT
  • Sum(Aggr(Count(DISTINCT{<NUM = {"=Count(NUM) > 1"}>}NUM), TXT)
    • Doesn’t work, gives me strange results

 

For duplicate:

  • Count({<NUM = {"=Count(NUM) > 1"}>}NUM)
    • Does not work because it gets the values independent from TXT
  • Sum(Aggr(Count({<NUM = {"=Count(NUM) > 1"}>}NUM), TXT)
    • Doesn’t work, gives me strange results

 

Labels (3)
2 Solutions

Accepted Solutions
tm_burgers
Creator III
Creator III

I would recommend adding the count of values to the data table itself, In this case since you are looking for duplicates across two dimensions; I would concat the two fields into a keyfield.

 

For example:

[Test_Data]:

LOAD * Inline [
ID, TXT, NUM
0, TXT1, 100001
1, TXT1, 100000
2, TXT1, 100000
3, TXT1, 100000
4, TXT1, 100000
5, TXT1, 100002
6, TXT2, 100001
7, TXT2, 100002
8, TXT2, 100002
9, TXT2, 100000
10, TXT3, 100001
11, TXT3, 100002
12, TXT3, 100005
13, TXT3, 100013
14, TXT3, 100004
15, TXT3, 100004
16, TXT3, 100010
16, TXT3, 100010]

 

LEFT Join (TEST_DATA)

LOAD *,

TXT&'-'&NUM as DuplicateIdentifier

Resident TEST_DATA;

 

LEFT Join (TEST_DATA)
LOAD

  DuplicateIdentifier,

  count(DuplicateIdentifier) as CountOccurrence

Resident TEST_DATA

Group by DuplicateIdentifier;

 

 

Then you can use SetAnalysis to search for CountOccurrence = {">1"} etc. 

View solution in original post

Kushal_Chawda

try below expressions

 

//Distinct Values
=count(aggr(if(Count(NUM)=1,NUM),TXT,NUM))

// Duplicate Values
=count(aggr(if(Count(NUM)>1,NUM),TXT,NUM))

//Duplicate Value Counts
=Concat(aggr(if(Count(NUM)>1,NUM&' : '&count(NUM)),TXT,NUM),',')

 

Annotation 2020-08-11 153349.png

View solution in original post

4 Replies
tm_burgers
Creator III
Creator III

I would recommend adding the count of values to the data table itself, In this case since you are looking for duplicates across two dimensions; I would concat the two fields into a keyfield.

 

For example:

[Test_Data]:

LOAD * Inline [
ID, TXT, NUM
0, TXT1, 100001
1, TXT1, 100000
2, TXT1, 100000
3, TXT1, 100000
4, TXT1, 100000
5, TXT1, 100002
6, TXT2, 100001
7, TXT2, 100002
8, TXT2, 100002
9, TXT2, 100000
10, TXT3, 100001
11, TXT3, 100002
12, TXT3, 100005
13, TXT3, 100013
14, TXT3, 100004
15, TXT3, 100004
16, TXT3, 100010
16, TXT3, 100010]

 

LEFT Join (TEST_DATA)

LOAD *,

TXT&'-'&NUM as DuplicateIdentifier

Resident TEST_DATA;

 

LEFT Join (TEST_DATA)
LOAD

  DuplicateIdentifier,

  count(DuplicateIdentifier) as CountOccurrence

Resident TEST_DATA

Group by DuplicateIdentifier;

 

 

Then you can use SetAnalysis to search for CountOccurrence = {">1"} etc. 

Kushal_Chawda

try below expressions

 

//Distinct Values
=count(aggr(if(Count(NUM)=1,NUM),TXT,NUM))

// Duplicate Values
=count(aggr(if(Count(NUM)>1,NUM),TXT,NUM))

//Duplicate Value Counts
=Concat(aggr(if(Count(NUM)>1,NUM&' : '&count(NUM)),TXT,NUM),',')

 

Annotation 2020-08-11 153349.png

Intector
Contributor
Contributor
Author

TM_burgers, tank you for the fast answer.

Your solution is what I was looking for and also very useful if one need the count values in the table for later usage.

 

 

Intector
Contributor
Contributor
Author

Kush, thank you for the fast answer,

Your answer is exactly what I was looking for. I didn't quite understood the "AGGR" function until now. Well, I guess there's a lot left for me to learn with this Qlik Sense, but it's very exiting stuff.