Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Need to find misinputted data, aggr(count) function

Hi there guys, I'm new to this discussion, so I'm sorry if I'm doing this wrong. But I'm just so stuck and decided to post this question here.

I have a very simple matter, currently I use excel to analyze this particular data. But now that we're changing to qlikview, I'm trying to use its power to help this analyze set much faster and better.

So lets say I have 3 fields

Contact            Name           Address

0217695         linda               weststreet

0216695         Laily                uppernorth

0217695         Amica              eastnorth

0218695         Kiara                 southkeep

0218695         Jane                 Westwind

0218695        Jane                   Westwind

0218695        Jane                    northbloom

0219995         mike                   eastflower

0219995         mike                   eastflower

The goal is to sort out the same contact number, but has different name attached to it , so the output are :

0217695 linda weststreet

0217695 Amica eastnorth

0218695 Kiara Southkeep

0218695 Jane Westwind

0218695 Jane Northbloom

The data that has only one entry or correct entry (exactly the same data)

0217695 kiara

0219995 mike

Should be sorted out

For my current progress I use an aggregation function of count in a straight table (expression tab) :

if(aggr(distinct count(contact), contact, name) > 1,0,1

But this lets the single data such as kiara 0217685, to enter as well

So I filter in the load script, count(contact) as number

And input the expression on the straight table became like this :

If(number > 1, aggr(distinct count(contact),contact  name) > 1,0,1), 0)

It removes kiara, but the data with 3 set but 1 difference (refering Jane), doesn't show up

Anyone knows how to make to make this easier or how to make this works? Thanks !

1 Solution

Accepted Solutions
swuehl
MVP
MVP

You could do it like this in the front end with your original table (three fields):

Create a straight table chart with three dimensions, no expression.

=Aggr(if(Count(DISTINCT Name&Address)>1, Contact),Contact) Name Address
0217695Amicaeastnorth
0217695lindaweststreet
0218695Janenorthbloom
0218695Kiarasouthkeep
0218695JaneWestwind

Enable 'Suppress When Value Is NULL' on dimension tab for first dimension and disable 'Suppress zero values' on presentation tab.

Create a second chart, only difference is replacing >1 with =1 in first dimension

=Aggr(if(Count(DISTINCT Name&Address)=1, Contact),Contact) Name Address
0216695Lailyuppernorth
0219995mikeeastflower

View solution in original post

8 Replies
swuehl
MVP
MVP

You could do it like this in the front end with your original table (three fields):

Create a straight table chart with three dimensions, no expression.

=Aggr(if(Count(DISTINCT Name&Address)>1, Contact),Contact) Name Address
0217695Amicaeastnorth
0217695lindaweststreet
0218695Janenorthbloom
0218695Kiarasouthkeep
0218695JaneWestwind

Enable 'Suppress When Value Is NULL' on dimension tab for first dimension and disable 'Suppress zero values' on presentation tab.

Create a second chart, only difference is replacing >1 with =1 in first dimension

=Aggr(if(Count(DISTINCT Name&Address)=1, Contact),Contact) Name Address
0216695Lailyuppernorth
0219995mikeeastflower
Not applicable
Author

That works very great ! I didnt even thought to approach from the dimension you're a life saver

Can I ask one more thing in this?

If there is a data of the above

Lets say there's 3 mike data

0219995 mike

0219995 mike

0219995 mike T

If using the function above, it would return with

0219995 mike T

0219995 mike

Is there anyway to make it able to detect if the name is similar up to a certain character?

Using left somehow seems wrong in the dimension...

The cause is becuse there are a lot of names being cut off in my data (ex: hansley hammerworth junior became hansley hammerworth ju)

swuehl
MVP
MVP

This might get a bit more complicates ( and maybe it would be better to start with a script based solution).

If you just want to take the name with the minimum length as pattern to match the alternatives, you could try

=Aggr(if(Count(DISTINCT Aggr(Left(Name,Len(Minstring(TOTAL<Contact> Name)))&Address,Contact,Name,Address))>1, Contact),Contact)

as calculated dimension for Contact in the first table.

Not applicable
Author

Thank you for your reply! Sorry it took this long to gey back.

Yes it is quite complicated and too bad that this file is only an example..

But still...if its no bother, Can you please help me explain the logic of minstring and the left? Also the three fields that gets to mentioned in the back (address, contact, name, address) I can't quite grasp the logic behind this.

Its just that the real data is consisting of more than 20 fields (compare to three) and if all 20 must be listed inside this function...i'll have to find a way to modify this in a script T.T

Thanks before !

swuehl
MVP
MVP

What do you mean with '20 fields (compare to three)'?  Could you post some few data lines that come closer to your real setting and explain how you want to filter these?

Not applicable
Author

What I mean by 20 fields is. The example above just stated 3 of them. There are name contact and address as well, but that's not all. There are other fields I need to link too, such as what products, how much, what promotion they're given, dates and etc.

The reason why Im stating 20 is because I saw the line expression : address, contact, name, address  which I assume that its basically stating the whole field which the data involved, also that is the reason why Im asking the logic behind this. (Since I want to modify what if there are 17 other fields other than name contact and address). Would the expression change if the fields number are different?

Thanks before!

swuehl
MVP
MVP

This part

Aggr(Left(Name,Len(Minstring(TOTAL<Contact> Name)))&Address,Contact,Name,Address)


is basically building a virtual table with dimensions Contact, Name, Address and a single expresssion


Left(Name,Len(Minstring(TOTAL<Contact> Name)))&Address

I've basically used the same three fields as dimensions I've used to build the expression.

So you don't need to add all fields as dimensions, but the relevant fields to build the string you want to count distinct.

It's hard to tell what the relevant fields are without knowing the full details.

Not applicable
Author

Sorry late to reply after new year holiday.

But thank you very much for the explanation. Going to tweak it a bit first with 3 fields. Then with another 20...if it aint working I'm going to post here again ( because after new year there is so many works to do)..

P.s The answer to my first question really helps my day ! Thanks again swuehl