Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How to remove duplicates

Hi team,

            I hope you are doing great. I’m kind of new working on Qlikview, and I have an issue with one formula. I hope someone can help me out.

What I’m trying to do is an expression that can remove duplicates. I have as the dimension the Supervisor ID, and each Supervisors have their own sellers. Each seller need to have an interaction day before the sale is done. They can just make one sale each day. So on my tables I have something like this, since sometimes may database duplicates the sale days.

SupervisorID

Seller

Interaction day

Sale day

1

Kenia

12/31/2013

01/03/2014

1

Kenia

12/31/2013

01/03/2014

1

Andres

12/31/2013

01/03/2014

1

Andres

12/31/2013

01/03/2014

1

Iñigo

12/2/2013

12/03/2013

1

Iñigo

12/2/2013

12/03/2013

My intention is just to count one Sale day for each interaction day, corresponding to each seller.

Meaning to have something like this:

SupervisorID

Seller

Interaction  day

Sale day

1

Kenia

12/31/2013

01/03/2014

1

Serrano

12/31/2013

01/03/2014

1

Iñigo

12/2/2013

12/03/2013

the total number of Sale day is “3”

I try to use a formula like: =count(DISTINCT if(saleday=interactionday,1,0))

What happned was that using this way DISTINCT it deleated one of the 01/03/2014, so I just stayed with “2”, but Im looking to have a 3 since are diferent sellers.

NOTE: I can’t modified my script nor my database. I have to use an expression.

I hope some one can halp me with this.

20 Replies
Not applicable
Author

Hi Nitha,

     I tried that way and it gave me only the Dinstinct form saleday, meaning that if other Seller sold on the same day the value was eliminated since was repeated even though was form different seller.

Nicole-Smith

Please post an example dashboard so that we can help you.  Right now, we're just going at it blindly, and obviously failing.

Preparing examples for Upload - Reduction and Data Scrambling

Not applicable
Author

True Nicole,

     Please find attached one example. In this case I have the same sellers and the same duplicates dates that I posted at the beginning.

I should have 103 salesday instead of 106 , since 3 of them are repeated. The same for interaction day, its repeated by 3 since the database repeated the saleday and the corresponding data for each Seller.

Not applicable
Author

"If you need to count only one Sales date per Interaction Date"

This is exactly what I need. I tried to use your formula but It didn't work. I just attached an example on my post. can you help me out with this?

Thanks.

Nicole-Smith

Oleg Troyansky's formula works for me.  I did make one minor change to it though.  Since you're displaying it on a chart where the Dimension is month, I also added Month to the aggr() function:

=count( AGGR( min(SaleDay), SupervisorID, Seller, InteractionDay, Month))

I have also attached the working .qvw file.

Not applicable
Author

Dear Oleg Troyansky,

I came across your answer to a similar problem I am facing and in need of your advice/assistance to resolve this.

I need to discount the duplicate vendors (highlighted in yellow) where the DocUploadDate field is blank. I only want to display the unique vendor count using the AGGR function that you taught Braulio Nunez.

The system should count the total number of unique vendors (InviteesStatName) that have a DocUploadDate with Invitees_Status other than "Invited" AND those single record vendors without a DocUploadDate where the Invitees_Status is "Invited" (please see the second screenshot). These vendors have not had their "Invited" status updated to another status and still have a [Blank] DocUploadDate.

I used Count(AGGR(min([DocUploadDate]), InviteesStatName, [Invitees_Status])) but it only displays vendors with a DocUploadDate (in this case is minimum). It does not display vendors without a DocUploadDate that are non-duplicates with status "Invited".

Appreciate your prompt assistance on this.

PLT - Vendors.jpg

PLT - Invited Vendors.png

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Samuel,

your problem may sound similar, but is totally different from the other one that I responded to... In your example, I doubt if you need to use AGGR. Instead, you need to use a simple distinct count of InviteesStatName, with a Set Analysis condition that reflects the logic that you have described - (Date = Blank and Status = Invited) or (Date <> Blank and Status <> Invited). Something along these lines:

count (

    {< InviteesStatName=P({<[DocUploadDate]={'[BLANK]'}, [Invitees_Status]={'Invited'}>} InviteesStatName)>} + P({<[DocUploadDate]-={'[BLANK]'}, [Invitees_Status]-={'Invited'}>} InviteesStatName)

distinct InviteesStatName)

cheers,

Oleg Troyansky

QlikView Your Business: An expert guide to Business Discovery with QlikView and Qlik Sense

Not applicable
Author

Hi Oleg,

Thanks for your kind help. However, I input the script as you recommended but it is filled with errors (see below). Can you please advise what has gone wrong and what needs to be done to correct it? What exactly is the P function for?

Script Error.png

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Samuel,

Sorry for the sloppy syntax. When I type suggested expressions here, I get no syntax checking ... The following might work better.

count (

    {< InviteesStatName=P({<[DocUploadDate]={'[BLANK]'}, [Invitees_Status]={'Invited'}>} InviteesStatName) + P({<[DocUploadDate]-={'[BLANK]'}, [Invitees_Status]-={'Invited'}>} InviteesStatName) >}

distinct InviteesStatName)


You still might have to tweak the syntax a bit...


Function P() returns possible values. I recommend reading the Help article Set Analysis, it describes the basics, or learn Set Analysis from my book QlikView Your Business.


cheers,

Oleg Troyansky

Not applicable
Author

Dear Oleg,

Thanks again for trying to assist me with the syntax. Although the script editor says the Expression is OK, however, I still see red lines being drawn below at the 2nd half of the syntax. Any idea to fix that?

Script Error.png

When I applied the change, the Bar Chart figure (circled in red) shows differently from those in the pivot table. In fact, it is going farther away from the original difference where the syntax is Count(DISTINCT {<Division=>} InviteesStatName). This is where the duplicate vendors (with a blank date and status invited but it has already changed to another status) are included in the calculation.

Invited Vendors Figure Mismatch.png

Look forward to hearing from you soon.

Regards,

Samuel