Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
felcar2013
Partner - Creator III
Partner - Creator III

how to clean a list of words

Hi

I want to clean a list of words, which is over 1 million long.

The base data was ratings given in form of sentences. I broke this down into words, with subfield(), but i get words with "commas" or "question marks" or other signs. As separator i used ' ' (empty). I need just the words, because i need to count the frequency.

Example

Word

bad

bad,

bad//

-bad

....bad

is considered as 3 different words, but it is only one. How can i eliminate all these signs around the words?

thanks

felipe

1 Solution

Accepted Solutions
tresesco
MVP
MVP

Use KeepChar() or PurgeChar(). Like

Load

          KeepChar(Word, 'abcdefghijklmnopqrstuvwxyz') as FreshWord

View solution in original post

2 Replies
tresesco
MVP
MVP

Use KeepChar() or PurgeChar(). Like

Load

          KeepChar(Word, 'abcdefghijklmnopqrstuvwxyz') as FreshWord

felcar2013
Partner - Creator III
Partner - Creator III
Author

thanks!