10 Replies Latest reply: Jul 31, 2017 11:33 AM by Cassandra Baqir

# String Percentage Comparison?

Is there a way to get QV to compare how similar two values are?

In other words, match on 75% of the string to return possible duplicates?

John Doe - ABCD

John Doe - EFGH

• ###### Re: String Percentage Comparison?

As a starter to detect string similarity, see the implementation of the Levenshtein Distance algorithm here: string matching with fuzzy, trigram (n-gram), levenshtein, etc.

Background info can be found here: Levenshtein distance - Wikipedia

• ###### Re: String Percentage Comparison?

can you post some more examples?

Is it always the first part of the string you're interested in?

regards

Marco

• ###### Re: String Percentage Comparison?

It's not always the first part of the string. Here's an example where there is an extra space in the middle:

 MBS -  Modified Barium Swallow Study MBS - Modified Barium Swallow Study

Here's another example not at the end:

 Transphenoidal (Pituitary) Surgery Postoperative Transsphenoidal (Pituitary) Surgery Postoperative

Example < vs less than:

 POC Risk for Nutritional Imbalance:  < Requirements (Newborn) POC Risk for Nutritional Imbalance:  < Requirements (NICU) POC Risk for Nutritional Imbalance:  less than Requirements (Newborn) POC Risk for Nutritional Imbalance:  less than Requirements (NICU)
• ###### Re: String Percentage Comparison?

Hi,

maybe one solution might be:

```tabPlanTemp:
PLAN_DESC
Where Len(Trim(PLAN_DESC));

Join
Resident tabPlanTemp;

tabPlan:
PLAN_DESC2,
Num(1-Levenshtein(PLAN_DESC,PLAN_DESC2)/RangeMax(Len(PLAN_DESC),Len(PLAN_DESC2)),'0.0%') as Similarity
Resident tabPlanTemp
Where not PLAN_DESC follows PLAN_DESC2;

DROP Table tabPlanTemp;
```

Another example:

Fuzzy Matching/Joining

hope this helps

regards

Marco

• ###### Re: String Percentage Comparison?

Super cool. Thanks!

• ###### Re: String Percentage Comparison?

you're welcome

regards

Marco

• ###### Re: String Percentage Comparison?

The load and drop of the temp table takes a very long time. Is there a way I can use a QVD to speed up the process? I've never tried a QVD on top of a temp table.

• ###### Re: String Percentage Comparison?

The load and drop of the temp table takes a very long time. Is there a way I can use a QVD to speed up the process? I've never tried a QVD on top of a temp table.

• ###### Re: String Percentage Comparison?

marcowedel

I tried loading my temp table from a QVD but it still takes more than an hour to refresh. Any suggestions on how to speed it up?

tabPlanTemp:
DESCRIPTION
FROM [..\External_Data\PP_USAGE_ECISA.xlsx] (ooxml, embedded labels, table is [CCG_PLANS])
Where Len(Trim(DESCRIPTION));

Join
Resident tabPlanTemp;

tabPlan:
PLAN_DESC2,
Num(1-Levenshtein(DESCRIPTION,PLAN_DESC2)/RangeMax(Len(DESCRIPTION),Len(PLAN_DESC2)),'0.0%') as Similarity
Resident tabPlanTemp
Where not DESCRIPTION follows PLAN_DESC2;

//tabPlanTemp: