Qlik Community

QlikView Documents

Documents for QlikView related information.

Announcements
QlikWorld, June 24-25, 2020. Free virtual event for DI and DA gurus. Register Today

Find Key Words and Phrases in Free Text Fields

Luminary
Luminary

Find Key Words and Phrases in Free Text Fields

Sometimes data comes in a free text format, customer comments and feedback are a great example. Here we have a wealth of information a company could harvest and dashboard in QlikView although because it’s free text is becomes difficult to summarise.

This code reads each comment field and searches it for a key word or phrase and counts the occurrences.


Key Words / Phrases

Key Words.png


Free Text Comment Field

Free Text Comments.png


Output Table

Output Table.png

Due to the fact its reading each possible word or phrase and reloading the comments each time this process could take a while the more data you have although typically comment type fields are small volumes. I would also use an incremental QVD builder to ensure I’m only transforming new data once.


Happy Qliking
Richard

Find more blog posts at QlikCentral.com

Attachments
Comments
Specialist
Specialist

Richard,

Thanks the post. Great as usual.

Luminary
Luminary

Thankyou for your kind words Márcio

0 Likes
Specialist
Specialist

Thanks for sharing, this will prove very usefull.

Specialist
Specialist

Hello Richard,  got interested in the topic of your post and wanted to try an alternate version of this, that perhaps could expand to any set of fed data without requiring programmatic changes..  hope you don't mind my attempt, and apologies for jumping in, but I liked what you had done.

http://community.qlik.com/docs/DOC-7081

Luminary
Luminary

Morning Evan,

I'm all for making the code more automated and like what you have done with the idea!

Richard

0 Likes
Not applicable

Hi Richard,

I do something similar using wildmatch, which means I only need to read the data on the first pass

I use it to pattern match text in incident tickets raised to a support desk to identify common issues in various guises!

in you example I could do the following.

 

load ID,Comments

, if(wildmatch(Comments,'*frustrating*', '*frustrated*', '*frustration*' )>0,1,0) as CE_Negative,
if(wildmatch(Comments,'*efficient*', '*prompt*', '*satisfied*' )>0,1,0) as CE_Positive,
if(wildmatch(Comments,'*friendlier*', '*rude*', '*not pleasant*' )>0,1,0) as Staff_Poor

;

Load * Inline [
ID ,Comments
1 ,I felt really frustrated about my service
2 ,My experience was not pleasant. I going to switch
3 ,The install was efficient however the staff were rude and not pleasant.
]

;

Just a different perspective

Richard

Luminary
Luminary

Thanks Richard (hope you're well).

I'll confess I haven't used this yet, although I'm looking forward to an opportunityas I can really see these types of techniques adding value quickly (as other options which don't include QlikView are available). This design was developed to answer a community question and how there key words were stored drove the design.

It's good to see these alternative ways, looking at the code I see its ticking a box rather than performing a count - not that that's a bad thing! 🙂

Thanks for sharing

Richard

0 Likes
Not applicable

Hi Richard,

You are right in that is sets a flag (or ticks a box)  I should have explained that.  In our application we then use this to select all the "incident tickets" that matched the expression.  We can also produce charts etc. , based on counts.

Our need is slightly different as a match condition is binary, it either matches or it does not, in your example your comment could have both positive and negative conotations.

In one of my applications, I load incident descriptions and timestamp data into table "data" (Actually its a binary load of a larger app)  then I post process that data as.

join (data) load [Incident Number],wildmatch(Description,
'*oradata*filesystem is now*',
'Error(s) written to /var/adm/messages * please contact UNIX team',
'The Qlikview Server service terminated unexpectedly.*',
'Possible performance problem which may impact service please contact Support*',
'monitor Alarm for *',
'*Backup failed with exit code*',
'bad app DB2 deadlocks*',
'Pmon is unable to connect*',
'*filesystem is now*percent full*',
'*background dump is currently in use on Oracle database*',
'*M1*terminated unexpectedly*',
'*java.lang.OutOfMemoryError. Restart*',
'Problem with process *(Process may be down*',
'* has hung threads. Contact *' ,
'Event Description: *Report server could not write to socket*' )
as Pattern_match resident data;


join LOAD * INLINE [
Pattern_match, Pattern Description
0, N/A
1, Oracle filesystem full
2, Var/ADM messages
3, Qlikview service Terminated
4, PSD Performance Alert
5, moniotor Alerts
6, Backup Failures
7, Bad app Deadlocks
8, Patrol Unable to connect
9, Oracle filesystem full
10, Background Dump Issue
11, M1 failure
12, Java OOM
13, Websphere server down or hung
14, Websphere server down or hung
15, Bad app report Server
]
;

You can see that there are 2 pattern descriptions for "Oracle Filesystem full" as there are 2 types of alert generated.

I agree with you its a very interesting side to analysis, I would be interested in anyones thoughts on other ways to do this type of analysis.

Thanks for posting an interesting article.  It does prompt a lot  of thought.

Richard


Contributor III
Contributor III

Hi I was looking if there was something similar like this for QlikSense.

As I'm looking at how with the use of a few free text field able to detect or find most frequently used Keywords/Phrase (or at least using TF-IDF model) 

0 Likes
Version history
Revision #:
1 of 1
Last update:
‎2014-08-26 04:38 AM
Updated by: