Hi,
I am working with a dataset which has a bug and unfortunately I need to carry on with it.
The first table has AppName and AppID. However, because of a bug in the source system, there are multiple app names with same AppIDs.
ApplicationID | AppName |
1200 | App01 |
1200 | App02 |
1200 | App03 |
1342 | App04 |
1349 | App05 |
1899 | App06 |
1899 | App99 |
1650 | App07 |
1633 | App08 |
1633 | App65 |
1340 | App09 |
1340 | App10 |
AppIDs 1200 (3 apps), 1899 (2 apps), and 1340 (2 apps) are problematic and only one app name is valid.
The valid app name is the one which is generating traffic (traffic data is another table)
AppName | Traffic |
App02 | 2300 |
App05 | 5000 |
App99 | 2100 |
App09 | 7800 |
App08 | 4300 |
So for AppID 1200, only App02 is generating traffic so this is the valid app while the other two (App01 and App03) are invalid.
I have used this logic to flag the valid App Name when they show up in the 2nd table with some traffic
T1:
LOAD
distinct
ApplicationID,
AppName as AppName2
FROM
[\\xx\Documents\sample.xlsx]
(ooxml, embedded labels, table is Sheet1);
Left join (T1)
LOAD distinct AppName,1 as flag,
AppName as AppName2
FROM
[\\xx\Documents\sample.xlsx]
(ooxml, embedded labels, table is Sheet2);
However, there also some valid apps which are not generating any traffic as of now and hence don't appear in traafic data table. So they are not getting flagged as being a valid app.
Is there any way to resolve this? When there are multiple AppNames for an AppID, I want to just take the AppName which is generating traffic. When there is single AppName and AppID, then I want to take that AppName too (irrespective of whether it appears in traffic data or not)
Any help would be appreciated!
@yashcena see the attached
hi,
you can use below script, it will give your result,
hi01:
load
ApplicationID,
AppName
FROM [lib:///New Microsoft Excel Worksheet.xlsx]
(ooxml, embedded labels, table is Sheet1);
left Join
LOAD
AppName,
'1' as Recount
FROM/New Microsoft Excel Worksheet.xlsx]
(ooxml, embedded labels, table is Sheet2);
Join(hi01)
LOAD
ApplicationID,
Count(ApplicationID)as tt
Resident hi01
Group By ApplicationID
;
store hi01 into /New Microsoft Excel Worksheet01.csv](txt);
drop table hi01;
KKK:
LOAD
ApplicationID,
AppName
// Recount,
// tt,
FROM [/New Microsoft Excel Worksheet01.csv]
(txt, utf8, embedded labels, delimiter is ',', msq)
where if(tt=1,tt,Recount)=1;
RESULT: