Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Percentage of similarity of two fields

Hi to everyone.

I need ad advise.

For a project I'm making I need to compare two fields in a strange manner.

I make an example to let you understand what my problem is:

ORIGINAL TABLE:

ID     Supplier

1          A

1          D

1          B

1          A

2          B

2          C

2          A

2          D

3          B

3          E

TABLE THAT I CREATE:

ID          List of Suppliers

1                   A, B, D

2                   A, B, C, D

3                   B, E

Now I need to select the "List of Suppliers" that reapeat more than once. But not only if they appear identical, all the list of suppliers that appear equal for the 70%.

So in this case the flag will be:

ID          List of Suppliers          FLAG

1                   A, B, D                   X               because "A, B, D" is fully repeated in "A, B, C, D"

2                   A, B, C, D              X               because the 75% of "A, B, C, D" is repeated in "A, B, D"

3                   B, E                        -                because only the 50% of "B, E" is repeated

Is it possible?

Many thanks to who can help!

Filippo

2 Replies
marcus_sommer

I think you will need several steps for this. There are to define which supplier-entry will be the master to which will all other values compared - I assume those ID with the most suppliers and this string-list-value and the number of suppliers with them needs you as variable-values against each of the other string-list-values will be in loop compared. Maybe something like this:

t1:

Load ID, count(ID) as MaxCountID, concat(Supplier, '|') as SupplierList From xyz Group By ID;

t2:

Noconcatenate Load * Resident t1 order by MaxCountID;

let vMaxCountID = peek('MaxCountID', -1, 't2');

let vSupplierList = peek('SupplierList', -1, 't2');

t3:

Load

     ID, $(vMaxCountID) as MaxCountID, SupplierList,

     if(index('$(vSupplierList)', subfield(SupplierList, '|', iterno())>=1, 1, 0) as FlagTemp

Resident t1 while iterno() <= substringcount(SupplierList, '|');

t4:

Load

     ID, MaxCountID, SupplierList,

     if(MaxCountID * 0.7 < sum(FlagTemp), 0, 1) as Flag

Resident t3 Group Bx ID, MaxCountID, SupplierList;

drop tables t1, t2, t3;

let vMaxCountID = null(); let vSupplierList = null();

It's just written down without any testing and there might be one or another syntax- or logical issue but I think it give you some ideas how it could be solved.

- Marcus

Not applicable
Author

Try this:

SampleData:

LOAD * Inline

[

ID,Supplier

1,A

1,D

1,B

1,A

2,B

2,C

2,A

2,D

3,B

3,E

]

;

test1:

LOAD ID, Count(DISTINCT Supplier) as NoSupplier

Resident SampleData

Group By ID;

join

LOAD Count( DISTINCT Supplier) as TotalSupplier

Resident SampleData

;

DROP Table SampleData;

test2:

  LOAD ID,NoSupplier,TotalSupplier,if(NoSupplier/TotalSupplier>=0.5,'Yes','No') as Flag

Resident test1

;

DROP Table test1;