Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How to identify recurrent terms in text fields?

Hello,

Is there a way to identify recurrent terms (or frequent words sequences) in text fields?

For example, I have 3 phrases, each one in distinct fields:

1 - Qlikview is the best BI Tool in the market

2 - A new BI Tool is about to be launched next year.

3 - Buying one BI Tool is the best solution for your work problems. Maybe next year, ok?

From the above example, is clear that "BI Tool" has appeared in all text fields. And "Next year" in two of them. Considering this, how can I create a new field with these specific words sequence, with an maximum of 3 words or preposition? Here, we could have two values in this new field: "BI Tool" - with 3 registers - and "next year" - with two registers.

The ideia here is create one "termcloud", which can show us better results in context than one simple wordcloud.

Tks!

1 Solution

Accepted Solutions
MarcoWedel

Hi,

maybe one solution could be something like:

QlikCommunity_Thread_253797_Pic1.JPG

QlikCommunity_Thread_253797_Pic2.JPG

QlikCommunity_Thread_253797_Pic3.JPG

tabPhrases:

LOAD RecNo() as ID, *

INLINE [

    Phrase

    Qlikview is the best BI Tool in the market

    A new BI Tool is about to be launched next year.

    "Buying one BI Tool is the best solution for your work problems. Maybe next year, ok?"

];

tabWordTuples:

LOAD Distinct

     *,

     SubStringCount(WordTuple,' ')+1 as WordCount;

LOAD ID,

     WordStart,

     Trim(PurgeChar(WordTuple,'.,?')) as WordTuple

Where Len(Trim(WordTuple));

LOAD ID,

     Div(IterNo()-1,3)+1 as WordStart,

     Mid(Phrase,Index(' '&Phrase,' ',Div(IterNo()-1,3)+1),Index(' '&Phrase&'  ',' ',Div(IterNo()-1,3)+Mod(IterNo()-1,3)+2)-Index(' '&Phrase,' ',Div(IterNo()-1,3)+1)-1) as WordTuple

Resident tabPhrases

While IterNo()<=(SubStringCount(Phrase,' ')+1)*3;

hope this helps

regards

Marco

View solution in original post

9 Replies
sunny_talwar

May be word cloud?

Word Cloud Object Extension

Not applicable
Author

No. Word Cloud I've already created here.

As I said, I'm looking for an "TermCloud"  instead of WordCloud (which is based on only words).

Tks

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Are you looking to find a set of predefined terms or auto discovery of the terms? If you auto-discover, you're also going to get hits in your example above like "is the", "Tool is", "the Best".  You can deal with some of that by purging the common words.

-Rob

MarcoWedel

Hi,

maybe one solution could be something like:

QlikCommunity_Thread_253797_Pic1.JPG

QlikCommunity_Thread_253797_Pic2.JPG

QlikCommunity_Thread_253797_Pic3.JPG

tabPhrases:

LOAD RecNo() as ID, *

INLINE [

    Phrase

    Qlikview is the best BI Tool in the market

    A new BI Tool is about to be launched next year.

    "Buying one BI Tool is the best solution for your work problems. Maybe next year, ok?"

];

tabWordTuples:

LOAD Distinct

     *,

     SubStringCount(WordTuple,' ')+1 as WordCount;

LOAD ID,

     WordStart,

     Trim(PurgeChar(WordTuple,'.,?')) as WordTuple

Where Len(Trim(WordTuple));

LOAD ID,

     Div(IterNo()-1,3)+1 as WordStart,

     Mid(Phrase,Index(' '&Phrase,' ',Div(IterNo()-1,3)+1),Index(' '&Phrase&'  ',' ',Div(IterNo()-1,3)+Mod(IterNo()-1,3)+2)-Index(' '&Phrase,' ',Div(IterNo()-1,3)+1)-1) as WordTuple

Resident tabPhrases

While IterNo()<=(SubStringCount(Phrase,' ')+1)*3;

hope this helps

regards

Marco

MarcoWedel

please close your thread if your question is answered:

Qlik Community Tip: Marking Replies as Correct or Helpful

thanks

regards

Marco

Not applicable
Author

Hi Macro,

Are you able to help on implement the same method on my qlikview?

Thanks!

https://community.qlik.com/message/1244265#1244265

MarcoWedel

I tried to.

See there.

regards

Marco

Keitaru
Creator
Creator

Hi  ,

I've applied your idea to what I'm doing on QlikSense Enterprise and it worked. However my bag of words / word tuple captured Date/Time stamp how do I not have the script not include date/timestamp/months information as well as stop words.