Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
Is there a way to identify recurrent terms (or frequent words sequences) in text fields?
For example, I have 3 phrases, each one in distinct fields:
1 - Qlikview is the best BI Tool in the market
2 - A new BI Tool is about to be launched next year.
3 - Buying one BI Tool is the best solution for your work problems. Maybe next year, ok?
From the above example, is clear that "BI Tool" has appeared in all text fields. And "Next year" in two of them. Considering this, how can I create a new field with these specific words sequence, with an maximum of 3 words or preposition? Here, we could have two values in this new field: "BI Tool" - with 3 registers - and "next year" - with two registers.
The ideia here is create one "termcloud", which can show us better results in context than one simple wordcloud.
Tks!
Hi,
maybe one solution could be something like:
tabPhrases:
LOAD RecNo() as ID, *
INLINE [
Phrase
Qlikview is the best BI Tool in the market
A new BI Tool is about to be launched next year.
"Buying one BI Tool is the best solution for your work problems. Maybe next year, ok?"
];
tabWordTuples:
LOAD Distinct
*,
SubStringCount(WordTuple,' ')+1 as WordCount;
LOAD ID,
WordStart,
Trim(PurgeChar(WordTuple,'.,?')) as WordTuple
Where Len(Trim(WordTuple));
LOAD ID,
Div(IterNo()-1,3)+1 as WordStart,
Mid(Phrase,Index(' '&Phrase,' ',Div(IterNo()-1,3)+1),Index(' '&Phrase&' ',' ',Div(IterNo()-1,3)+Mod(IterNo()-1,3)+2)-Index(' '&Phrase,' ',Div(IterNo()-1,3)+1)-1) as WordTuple
Resident tabPhrases
While IterNo()<=(SubStringCount(Phrase,' ')+1)*3;
hope this helps
regards
Marco
May be word cloud?
For sense:
No. Word Cloud I've already created here.
As I said, I'm looking for an "TermCloud" instead of WordCloud (which is based on only words).
Tks
Are you looking to find a set of predefined terms or auto discovery of the terms? If you auto-discover, you're also going to get hits in your example above like "is the", "Tool is", "the Best". You can deal with some of that by purging the common words.
-Rob
Hi,
maybe one solution could be something like:
tabPhrases:
LOAD RecNo() as ID, *
INLINE [
Phrase
Qlikview is the best BI Tool in the market
A new BI Tool is about to be launched next year.
"Buying one BI Tool is the best solution for your work problems. Maybe next year, ok?"
];
tabWordTuples:
LOAD Distinct
*,
SubStringCount(WordTuple,' ')+1 as WordCount;
LOAD ID,
WordStart,
Trim(PurgeChar(WordTuple,'.,?')) as WordTuple
Where Len(Trim(WordTuple));
LOAD ID,
Div(IterNo()-1,3)+1 as WordStart,
Mid(Phrase,Index(' '&Phrase,' ',Div(IterNo()-1,3)+1),Index(' '&Phrase&' ',' ',Div(IterNo()-1,3)+Mod(IterNo()-1,3)+2)-Index(' '&Phrase,' ',Div(IterNo()-1,3)+1)-1) as WordTuple
Resident tabPhrases
While IterNo()<=(SubStringCount(Phrase,' ')+1)*3;
hope this helps
regards
Marco
please close your thread if your question is answered:
Qlik Community Tip: Marking Replies as Correct or Helpful
thanks
regards
Marco
Hi Macro,
Are you able to help on implement the same method on my qlikview?
Thanks!
I tried to.
See there.
regards
Marco
Hi MarcoWedel ,
I've applied your idea to what I'm doing on QlikSense Enterprise and it worked. However my bag of words / word tuple captured Date/Time stamp how do I not have the script not include date/timestamp/months information as well as stop words.