Percentage of similarity of two fields - Qlik Community

Report Inappropriate Content · ‎2015-10-05

Hi to everyone.

I need ad advise.

For a project I'm making I need to compare two fields in a strange manner.

I make an example to let you understand what my problem is:

ORIGINAL TABLE:

ID Supplier

1 A

1 D

1 B

1 A

2 B

2 C

2 A

2 D

3 B

3 E

TABLE THAT I CREATE:

ID List of Suppliers

1 A, B, D

2 A, B, C, D

3 B, E

Now I need to select the "List of Suppliers" that reapeat more than once. But not only if they appear identical, all the list of suppliers that appear equal for the 70%.

So in this case the flag will be:

ID List of Suppliers FLAG

1 A, B, D X because "A, B, D" is fully repeated in "A, B, C, D"

2 A, B, C, D X because the 75% of "A, B, C, D" is repeated in "A, B, D"

3 B, E - because only the 50% of "B, E" is repeated

Is it possible?

Many thanks to who can help!

Filippo

marcus_sommer · ‎2015-10-05

I think you will need several steps for this. There are to define which supplier-entry will be the master to which will all other values compared - I assume those ID with the most suppliers and this string-list-value and the number of suppliers with them needs you as variable-values against each of the other string-list-values will be in loop compared. Maybe something like this:

t1:

Load ID, count(ID) as MaxCountID, concat(Supplier, '|') as SupplierList From xyz Group By ID;

t2:

Noconcatenate Load * Resident t1 order by MaxCountID;

let vMaxCountID = peek('MaxCountID', -1, 't2');

let vSupplierList = peek('SupplierList', -1, 't2');

t3:

Load

ID, $(vMaxCountID) as MaxCountID, SupplierList,

if(index('$(vSupplierList)', subfield(SupplierList, '|', iterno())>=1, 1, 0) as FlagTemp

Resident t1 while iterno() <= substringcount(SupplierList, '|');

t4:

Load

ID, MaxCountID, SupplierList,

if(MaxCountID * 0.7 < sum(FlagTemp), 0, 1) as Flag

Resident t3 Group Bx ID, MaxCountID, SupplierList;

drop tables t1, t2, t3;

let vMaxCountID = null(); let vSupplierList = null();

It's just written down without any testing and there might be one or another syntax- or logical issue but I think it give you some ideas how it could be solved.

- Marcus

Report Inappropriate Content · ‎2015-10-05

Try this:

SampleData:

LOAD * Inline

[

ID,Supplier

1,A

1,D

1,B

1,A

2,B

2,C

2,A

2,D

3,B

3,E

]

;

test1:

LOAD ID, Count(DISTINCT Supplier) as NoSupplier

Resident SampleData

Group By ID;

join

LOAD Count( DISTINCT Supplier) as TotalSupplier

Resident SampleData

;

DROP Table SampleData;

test2:

LOAD ID,NoSupplier,TotalSupplier,if(NoSupplier/TotalSupplier>=0.5,'Yes','No') as Flag

Resident test1

;

DROP Table test1;