    count "almost" distinct values

    Evgeny Stuchalkin

      Hello! I'm have a client base, with phone numbers and names. I want to filter bots, thats placing orders sometimes. One of the main bot's singn: one phone number with different names.


      So, i can count distinct names for every phone number.



      But! Sometime real customers entering ther names differently. Like this:


      Perfect example. This is three versions of one name: Short Name, Extended name with error, and Real extended name.

      Also, some body use several space bars between words, or even strange symbols like "+" instead of space bar.


      So, my question is: How can i count this names as one?