Qlik Community

QlikView Documents

Documents for QlikView related information.

Find Key Words and Phrases in Free Text Fields

richard_pearce6
Valued Contributor

Find Key Words and Phrases in Free Text Fields

Sometimes data comes in a free text format, customer comments and feedback are a great example. Here we have a wealth of information a company could harvest and dashboard in QlikView although because it’s free text is becomes difficult to summarise.

This code reads each comment field and searches it for a key word or phrase and counts the occurrences.


Key Words / Phrases

Key Words.png


Free Text Comment Field

Free Text Comments.png


Output Table

Output Table.png

Due to the fact its reading each possible word or phrase and reloading the comments each time this process could take a while the more data you have although typically comment type fields are small volumes. I would also use an incremental QVD builder to ensure I’m only transforming new data once.


Happy Qliking
Richard

Find more blog posts at QlikCentral.com

Attachments
Comments
MCampestrini
Valued Contributor

Richard,

Thanks the post. Great as usual.

richard_pearce6
Valued Contributor

Thankyou for your kind words Márcio

shane_spencer
Valued Contributor

Thanks for sharing, this will prove very usefull.

evan_kurowski
Valued Contributor

Hello Richard,  got interested in the topic of your post and wanted to try an alternate version of this, that perhaps could expand to any set of fed data without requiring programmatic changes..  hope you don't mind my attempt, and apologies for jumping in, but I liked what you had done.

http://community.qlik.com/docs/DOC-7081

richard_pearce6
Valued Contributor

Morning Evan,

I'm all for making the code more automated and like what you have done with the idea!

Richard

Not applicable

Hi Richard,

I do something similar using wildmatch, which means I only need to read the data on the first pass

I use it to pattern match text in incident tickets raised to a support desk to identify common issues in various guises!

in you example I could do the following.

 

load ID,Comments

, if(wildmatch(Comments,'*frustrating*', '*frustrated*', '*frustration*' )>0,1,0) as CE_Negative,
if(wildmatch(Comments,'*efficient*', '*prompt*', '*satisfied*' )>0,1,0) as CE_Positive,
if(wildmatch(Comments,'*friendlier*', '*rude*', '*not pleasant*' )>0,1,0) as Staff_Poor

;

Load * Inline [
ID ,Comments
1 ,I felt really frustrated about my service
2 ,My experience was not pleasant. I going to switch
3 ,The install was efficient however the staff were rude and not pleasant.
]

;

Just a different perspective

Richard

richard_pearce6
Valued Contributor

Thanks Richard (hope you're well).

I'll confess I haven't used this yet, although I'm looking forward to an opportunityas I can really see these types of techniques adding value quickly (as other options which don't include QlikView are available). This design was developed to answer a community question and how there key words were stored drove the design.

It's good to see these alternative ways, looking at the code I see its ticking a box rather than performing a count - not that that's a bad thing! :-)

Thanks for sharing

Richard

Not applicable

Hi Richard,

You are right in that is sets a flag (or ticks a box)  I should have explained that.  In our application we then use this to select all the "incident tickets" that matched the expression.  We can also produce charts etc. , based on counts.

Our need is slightly different as a match condition is binary, it either matches or it does not, in your example your comment could have both positive and negative conotations.

In one of my applications, I load incident descriptions and timestamp data into table "data" (Actually its a binary load of a larger app)  then I post process that data as.

join (data) load [Incident Number],wildmatch(Description,
'*oradata*filesystem is now*',
'Error(s) written to /var/adm/messages * please contact UNIX team',
'The Qlikview Server service terminated unexpectedly.*',
'Possible performance problem which may impact service please contact Support*',
'monitor Alarm for *',
'*Backup failed with exit code*',
'bad app DB2 deadlocks*',
'Pmon is unable to connect*',
'*filesystem is now*percent full*',
'*background dump is currently in use on Oracle database*',
'*M1*terminated unexpectedly*',
'*java.lang.OutOfMemoryError. Restart*',
'Problem with process *(Process may be down*',
'* has hung threads. Contact *' ,
'Event Description: *Report server could not write to socket*' )
as Pattern_match resident data;


join LOAD * INLINE [
Pattern_match, Pattern Description
0, N/A
1, Oracle filesystem full
2, Var/ADM messages
3, Qlikview service Terminated
4, PSD Performance Alert
5, moniotor Alerts
6, Backup Failures
7, Bad app Deadlocks
8, Patrol Unable to connect
9, Oracle filesystem full
10, Background Dump Issue
11, M1 failure
12, Java OOM
13, Websphere server down or hung
14, Websphere server down or hung
15, Bad app report Server
]
;

You can see that there are 2 pattern descriptions for "Oracle Filesystem full" as there are 2 types of alert generated.

I agree with you its a very interesting side to analysis, I would be interested in anyones thoughts on other ways to do this type of analysis.

Thanks for posting an interesting article.  It does prompt a lot  of thought.

Richard


Version history
Revision #:
1 of 1
Last update:
‎08-26-2014 04:38 AM
Updated by: