2 Replies Latest reply: Dec 2, 2013 6:59 AM by Felipe Carrera RSS

    how to clean a list of words

    Felipe Carrera

      Hi

      I want to clean a list of words, which is over 1 million long.

      The base data was ratings given in form of sentences. I broke this down into words, with subfield(), but i get words with "commas" or "question marks" or other signs. As separator i used ' ' (empty). I need just the words, because i need to count the frequency.

      Example

      Word

      bad

      bad,

      bad//

      -bad

      ....bad

       

      is considered as 3 different words, but it is only one. How can i eliminate all these signs around the words?

      thanks

      felipe