Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All,
I'm connecting two databases: Leads and Apps. When a lead becomes an app, we record the AppId on the Leads table to connect it with the AppId on the App table and do so with a simple JOIN statement. Pretty simple so far, right? My issue stems from the fact that we decided to allow multiple leads to be connected to the same app. This is necessary for our business model, but has caused a problem for me when reporting. Because there may be two or more LeadIds for each AppId, some data is getting duplicated, as in this dummy data.
LeadId | AppId | AppName | AppDate |
---|---|---|---|
11 | 100 | Donald Duck | 1/1/2018 |
22 | 100 | Donald Duck | 1/1/2018 |
33 | 101 | Mickey Mouse | 1/2/2018 |
44 | 102 | Minnie Mouse | 1/3/2018 |
55 | 102 | Minnie Mouse | 1/3/2018 |
I don't want to see the same app more than once. What approach do I need to take?
I was able to do this using script similar to the following:
T1:
Load
LeadId,
If(Previous(AppId)=AppId,0,AppId) AS AppId,
AppName,
AppDate
From ...
Order by AppId;
Final:
Load
LeadId,
AppId,
AppName,
AppDate
Resident T1;
Drop Table T1;
This gets rid of the duplicates, but I now need a way to be able to choose which duplicate to remove and which to keep. I'll close this thread and open a new one here: Choose which duplicate to remove.
I was able to do this using script similar to the following:
T1:
Load
LeadId,
If(Previous(AppId)=AppId,0,AppId) AS AppId,
AppName,
AppDate
From ...
Order by AppId;
Final:
Load
LeadId,
AppId,
AppName,
AppDate
Resident T1;
Drop Table T1;
This gets rid of the duplicates, but I now need a way to be able to choose which duplicate to remove and which to keep. I'll close this thread and open a new one here: Choose which duplicate to remove.