Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
Could you please guide me how to find repetitive word from a string and i need to find how many times the word has repeated.
Example
String
Manager.
Managers
MANAGER
APPRECIATE
APPRECIATED
APPRECIATES
APPRECIATION
I need to group the word which has similar words and i need to find how many times the word repeated .
My O/P should be
String , COUNT
MANAGER , 3
APPRECIATE, 4
Thanks in advance
Why is APPRECIATION a repetition of APPRECIATE?
I can see what you are looking for, but what determines the base words you are looking for and which other words needs to be taken into account?
Shan,
Are you trying to do this in the load script or within a visualization? (probably easier in the load)
Jonathan
Hi,
I am trying in load statement
Hi Stefan,
Thanks for the reply, i am trying this for word cloud functionality to highlight the more repeated word from a string.
The base word can be a least level from a group
Example
String
Manager.
Managers
MANAGER
APPRECIATE
APPRECIATED
APPRECIATES
APPRECIATION
For Manager group it can be Manager
Manager.
Managers
MANAGER
For APPRECIATE group it can be APPRECIATE
APPRECIATE
APPRECIATED
APPRECIATES
APPRECIATION
This should be dynamic as the string has more words
You can do it like
MAP:
MAPPING LOAD Stem, '@1@'&Word&'@2@' INLINE [
Stem, Word
MANAGE, MANAGER
APPRECIAT, APPRECIATE
];
INPUT:
LOAD *, Textbetween(MapSubString('MAP',Upper(String)),'@1@','@2@') as Word INLINE [
String
Manager.
Managers
MANAGER
APPRECIATE
APPRECIATED
APPRECIATES
APPRECIATION
];
But still, you need to define the Stem / Word mapping.
Shan,
If you want to make the stems dynamic (i.e., you don't want to define them before-hand), you could do some looping to determine for each word if it is a substring of another word (use the Index function) and there are no cases where the word is a substring of another word. Also, use the Lower function first so you don't have to worry about case. So, "manage" would be a substring of "manager" and "managed", but never a 'superstring' of any other words. You'd therefore classify "manage" as one of your stems. Repeat for each word. This would be a two-level loop (quadratic), so it might be slow. Sorry, I don't have time to work out the syntax right now.
Now, if you wanted to have "appreciat" as the stem for both "appreciated" and "appreciating", this would be even more complex because "appreciat" is not a word in the corpus. You could do another loop in which you look at all possible substrings of each given word to see if they are in other words, but this would be very cumbersome and inefficient. But, since its in the load script, if you don't mind a long run time (if there are a lot of words), you could give it a try.
Jonathan