Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
felcar2013
Partner - Creator III
Partner - Creator III

how to clean a list of words

Hi

I want to clean a list of words, which is over 1 million long.

The base data was ratings given in form of sentences. I broke this down into words, with subfield(), but i get words with "commas" or "question marks" or other signs. As separator i used ' ' (empty). I need just the words, because i need to count the frequency.

Example

Word

bad

bad,

bad//

-bad

....bad

is considered as 3 different words, but it is only one. How can i eliminate all these signs around the words?

thanks

felipe

1 Solution

Accepted Solutions
tresesco
MVP
MVP

Use KeepChar() or PurgeChar(). Like

Load

          KeepChar(Word, 'abcdefghijklmnopqrstuvwxyz') as FreshWord

View solution in original post

2 Replies
tresesco
MVP
MVP

Use KeepChar() or PurgeChar(). Like

Load

          KeepChar(Word, 'abcdefghijklmnopqrstuvwxyz') as FreshWord

felcar2013
Partner - Creator III
Partner - Creator III
Author

thanks!