Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi to everyone.
I need ad advise.
For a project I'm making I need to compare two fields in a strange manner.
I make an example to let you understand what my problem is:
ORIGINAL TABLE:
ID Supplier
1 A
1 D
1 B
1 A
2 B
2 C
2 A
2 D
3 B
3 E
TABLE THAT I CREATE:
ID List of Suppliers
1 A, B, D
2 A, B, C, D
3 B, E
Now I need to select the "List of Suppliers" that reapeat more than once. But not only if they appear identical, all the list of suppliers that appear equal for the 70%.
So in this case the flag will be:
ID List of Suppliers FLAG
1 A, B, D X because "A, B, D" is fully repeated in "A, B, C, D"
2 A, B, C, D X because the 75% of "A, B, C, D" is repeated in "A, B, D"
3 B, E - because only the 50% of "B, E" is repeated
Is it possible?
Many thanks to who can help!
Filippo
I think you will need several steps for this. There are to define which supplier-entry will be the master to which will all other values compared - I assume those ID with the most suppliers and this string-list-value and the number of suppliers with them needs you as variable-values against each of the other string-list-values will be in loop compared. Maybe something like this:
t1:
Load ID, count(ID) as MaxCountID, concat(Supplier, '|') as SupplierList From xyz Group By ID;
t2:
Noconcatenate Load * Resident t1 order by MaxCountID;
let vMaxCountID = peek('MaxCountID', -1, 't2');
let vSupplierList = peek('SupplierList', -1, 't2');
t3:
Load
ID, $(vMaxCountID) as MaxCountID, SupplierList,
if(index('$(vSupplierList)', subfield(SupplierList, '|', iterno())>=1, 1, 0) as FlagTemp
Resident t1 while iterno() <= substringcount(SupplierList, '|');
t4:
Load
ID, MaxCountID, SupplierList,
if(MaxCountID * 0.7 < sum(FlagTemp), 0, 1) as Flag
Resident t3 Group Bx ID, MaxCountID, SupplierList;
drop tables t1, t2, t3;
let vMaxCountID = null(); let vSupplierList = null();
It's just written down without any testing and there might be one or another syntax- or logical issue but I think it give you some ideas how it could be solved.
- Marcus
Try this:
SampleData:
LOAD * Inline
[
ID,Supplier
1,A
1,D
1,B
1,A
2,B
2,C
2,A
2,D
3,B
3,E
]
;
test1:
LOAD ID, Count(DISTINCT Supplier) as NoSupplier
Resident SampleData
Group By ID;
join
LOAD Count( DISTINCT Supplier) as TotalSupplier
Resident SampleData
;
DROP Table SampleData;
test2:
LOAD ID,NoSupplier,TotalSupplier,if(NoSupplier/TotalSupplier>=0.5,'Yes','No') as Flag
Resident test1
;
DROP Table test1;