Qlik Community

QlikView Scripting

Discussion Board for collaboration on QlikView Scripting.

Announcements
QlikView Fans! We’d love to hear from you.
Share your QlikView feedback with the product team… Click here to participate in our 5-minute survey.
Rules, plus terms and conditions, can be found here.
Not applicable

Percentage of similarity of two fields

Hi to everyone.

I need ad advise.

For a project I'm making I need to compare two fields in a strange manner.

I make an example to let you understand what my problem is:

ORIGINAL TABLE:

ID     Supplier

1          A

1          D

1          B

1          A

2          B

2          C

2          A

2          D

3          B

3          E

TABLE THAT I CREATE:

ID          List of Suppliers

1                   A, B, D

2                   A, B, C, D

3                   B, E

Now I need to select the "List of Suppliers" that reapeat more than once. But not only if they appear identical, all the list of suppliers that appear equal for the 70%.

So in this case the flag will be:

ID          List of Suppliers          FLAG

1                   A, B, D                   X               because "A, B, D" is fully repeated in "A, B, C, D"

2                   A, B, C, D              X               because the 75% of "A, B, C, D" is repeated in "A, B, D"

3                   B, E                        -                because only the 50% of "B, E" is repeated

Is it possible?

Many thanks to who can help!

Filippo

Tags (1)
2 Replies

Re: Percentage of similarity of two fields

I think you will need several steps for this. There are to define which supplier-entry will be the master to which will all other values compared - I assume those ID with the most suppliers and this string-list-value and the number of suppliers with them needs you as variable-values against each of the other string-list-values will be in loop compared. Maybe something like this:

t1:

Load ID, count(ID) as MaxCountID, concat(Supplier, '|') as SupplierList From xyz Group By ID;

t2:

Noconcatenate Load * Resident t1 order by MaxCountID;

let vMaxCountID = peek('MaxCountID', -1, 't2');

let vSupplierList = peek('SupplierList', -1, 't2');

t3:

Load

     ID, $(vMaxCountID) as MaxCountID, SupplierList,

     if(index('$(vSupplierList)', subfield(SupplierList, '|', iterno())>=1, 1, 0) as FlagTemp

Resident t1 while iterno() <= substringcount(SupplierList, '|');

t4:

Load

     ID, MaxCountID, SupplierList,

     if(MaxCountID * 0.7 < sum(FlagTemp), 0, 1) as Flag

Resident t3 Group Bx ID, MaxCountID, SupplierList;

drop tables t1, t2, t3;

let vMaxCountID = null(); let vSupplierList = null();

It's just written down without any testing and there might be one or another syntax- or logical issue but I think it give you some ideas how it could be solved.

- Marcus

Not applicable

Re: Percentage of similarity of two fields

Try this:

SampleData:

LOAD * Inline

[

ID,Supplier

1,A

1,D

1,B

1,A

2,B

2,C

2,A

2,D

3,B

3,E

]

;

test1:

LOAD ID, Count(DISTINCT Supplier) as NoSupplier

Resident SampleData

Group By ID;

join

LOAD Count( DISTINCT Supplier) as TotalSupplier

Resident SampleData

;

DROP Table SampleData;

test2:

  LOAD ID,NoSupplier,TotalSupplier,if(NoSupplier/TotalSupplier>=0.5,'Yes','No') as Flag

Resident test1

;

DROP Table test1;

Community Browser