repetitive word from a string - Qlik Community

Anonymous · ‎2017-09-28

Hi All,

Could you please guide me how to find repetitive word from a string and i need to find how many times the word has repeated.

Example

String

Manager.

Managers

MANAGER

APPRECIATE

APPRECIATED

APPRECIATES

APPRECIATION

I need to group the word which has similar words and i need to find how many times the word repeated .

My O/P should be

String , COUNT

MANAGER , 3

APPRECIATE, 4

Thanks in advance

swuehl · ‎2017-09-28

Why is APPRECIATION a repetition of APPRECIATE?

I can see what you are looking for, but what determines the base words you are looking for and which other words needs to be taken into account?

jonvitale · ‎2017-09-28

Shan,

Are you trying to do this in the load script or within a visualization? (probably easier in the load)

Jonathan

Anonymous · ‎2017-09-28

Hi,

I am trying in load statement

Anonymous · ‎2017-09-28

Hi Stefan,

Thanks for the reply, i am trying this for word cloud functionality to highlight the more repeated word from a string.

The base word can be a least level from a group

Example

String

Manager.

Managers

MANAGER

APPRECIATE

APPRECIATED

APPRECIATES

APPRECIATION

For Manager group it can be Manager

Manager.

Managers

MANAGER

For APPRECIATE group it can be APPRECIATE

APPRECIATE

APPRECIATED

APPRECIATES

APPRECIATION

This should be dynamic as the string has more words

swuehl · ‎2017-09-28

You can do it like

MAP:

MAPPING LOAD Stem, '@1@'&Word&'@2@' INLINE [

Stem, Word

MANAGE, MANAGER

APPRECIAT, APPRECIATE

];

INPUT:

LOAD *, Textbetween(MapSubString('MAP',Upper(String)),'@1@','@2@') as Word INLINE [

String

Manager.

Managers

MANAGER

APPRECIATE

APPRECIATED

APPRECIATES

APPRECIATION

];

But still, you need to define the Stem / Word mapping.

jonvitale · ‎2017-09-28

Shan,

If you want to make the stems dynamic (i.e., you don't want to define them before-hand), you could do some looping to determine for each word if it is a substring of another word (use the Index function) and there are no cases where the word is a substring of another word. Also, use the Lower function first so you don't have to worry about case. So, "manage" would be a substring of "manager" and "managed", but never a 'superstring' of any other words. You'd therefore classify "manage" as one of your stems. Repeat for each word. This would be a two-level loop (quadratic), so it might be slow. Sorry, I don't have time to work out the syntax right now.

Now, if you wanted to have "appreciat" as the stem for both "appreciated" and "appreciating", this would be even more complex because "appreciat" is not a word in the corpus. You could do another loop in which you look at all possible substrings of each given word to see if they are in other words, but this would be very cumbersome and inefficient. But, since its in the load script, if you don't mind a long run time (if there are a lot of words), you could give it a try.

Jonathan