Choose which duplicate to remove.

acbishop2 · ‎2018-08-09

Hello All,

I'm connecting two tables: Leads and Apps. When a lead becomes an app, we record the AppId on the Leads table to connect it with the AppId on the App table and do so with a simple JOIN statement. Pretty simple so far, right? My issue stems from the fact that we decided to allow multiple leads to be connected to the same app. This is necessary for our business model, but has caused a problem for me when reporting. Because there may be two or more LeadIds for each AppId, some data is getting duplicated, as in this dummy data.

LeadId	CampaignId	AppId	AppName	AppDate
11	123123	100	Donald Duck	1/1/2018
22		100	Donald Duck	1/1/2018
33	123123	101	Mickey Mouse	1/2/2018
44		102	Minnie Mouse	1/3/2018
55	555555	102	Minnie Mouse	1/3/2018

The problem with the data is that if I were to count(AppId), I would get 5 instead of the desired 3 (yes, I know I could do count(distinct(AppId)), but that wouldn't solve the actual problem). I was able to remove duplicates, but my script simply gets rid of the second one (or any after the first). Here's how it looks currently:

T1:

Load

LeadId,

CampaignId,

If(Previous(AppId)=AppId,0,AppId) as AppId,

AppName,

AppDate

From ...

Order by AppId;

Final:

Load

LeadId,

CampaignId,

AppId,

AppName,

AppDate

Resident T1;

Drop Table T1;

Instead of getting rid of all duplicates after the first, I need to be able to keep the one with a value in the CampaignId column. In the table above, I would keep the rows with LeadIds 11, 33, and 55.

I hope I articulated my need well enough. If anything didn't make sense, let me know. And as always, thanks in advance for the help!

framacdev · ‎2018-08-10

Hi Aaron - thanks for sharing your business problem.

The way to resolve your problem is to use the Exists function, very helpful when you want to filter field values that have already been loaded. Here is the link reference: https://help.qlik.com/en-US/qlikview/November2017/Subsystems/Client/Content/Scripting/InterRecordFun...

What you need to to is to load after the "Drop Table T1" statement the following script content:

OnlyCampaignId:

LOAD CampaignId

Resident Final

;

NoConcatenate

LOAD *

Resident Final

Where Exists(CampaignId)

;

DROP TABLES Final, OnlyCampaignId;

I hope the mentioned script resolves your business problem.

Best regards,

Francesco

jonathandienst · ‎2018-08-10

When data has differing granularity, it is better to let Qlik handle the association, rather than joining the tables.The tables will associate on AppID.

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein

Script and Expressions