Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
Venkatesh_5990
Contributor III
Contributor III

How to implement Qliksense - Text Analysis

Hi Friends,

I got a requirement that is Categories the types of Incidents from ServiceNow data.

Our source data from ServiceNow, every day we will load data from ServiceNow to Qlik sense. Actually the data flow is ServiceNow >> Gsheet >> Qliksense.

Actually the requirement is in the below:

Please find the below sample incident data which was raised by users with different short descriptions.

Incident_number Short Description Created By Open Date Due Date State
INC001 Unable to access XYZ stream, please provide access ABC 3/27/2024 4/1/2024 Open
INC002 Got error message like 'Access Denied' while accessing the ABC dashboard URL link CDE 3/26/2024 4/1/2024 Open
INC003 I am unable to access the folder FGH 3/25/2024 4/1/2024 Open
INC004 I am unable to sign in Workday portal IJK 3/24/2024 4/1/2024 Open
INC005 Can you grand me access to the database LMN 3/23/2024 4/1/2024 Open
INC006 Batch job failed to run OPQ 3/22/2024 4/1/2024 Open
INC007 XYZ Dashboard qmc task aborted RST 3/21/2024 4/1/2024 Open
INC008 Failed - Informatica xyz_inflow task incomplete UVW 3/20/2024 4/1/2024 Open

 

Here row number from 1 to 5 should fall under "Access Related Issue" category and row number from 6 to 8 should fall under "Batch Job Related Issue".

Category Count of Incidents
Access Related Issue 5
Batch Job Related Issue 3

 

How can we achieve this. Can you please help me this.

Appreciate your help and thanks in advance.

 

Labels (3)
7 Replies
F_B
Specialist
Specialist

This is a very hard task without using a flag or a default list of descriptions.

Venkatesh_5990
Contributor III
Contributor III
Author

Thanks @F_B for your comment.

Yes it is hard task. I tried with wildmatch function with hardcode a specific words but clients are not accepting. They are approaching ML and Python way to implement this task.

 

I am not an expert in ML and even Qlik also so want your all help for this post.

 

Thank you.

BrunPierre
Partner - Master
Partner - Master

Indeed. Perhaps you can use SubStringCount() or WildMatch() to identify specifics in the text.

Venkatesh_5990
Contributor III
Contributor III
Author

Actually clients requested to don't use hardcode anything in the dashboard. Why because please see the below example you can understand the reason of it 

Ex. If the user creating an incident regarding about server issues. Accidentally user mentioned the short description like "serrver" instead of "server". 

But in our wildmatch function and Subfield function we wrote like "server" only then this incident won't be fall under "Server Related Issue" Category right.

I hope you can understand.

marcus_sommer

The difficulties are mainly not related to the used tool else to define the appropriate rules and logic which are suitable to the scenario. Implementing it in any tool will have their challenges and efforts.

If I had this task I would probably do it in Qlik with the following measurements:

  • splitting the text in single words with something like: subfield(Text, ' ', iterno()) within a while-loop
  • classifying the content into verbs, adjectives, expletive and so on
  • filtering against the word-types
  • filtering against excluding-list
  • cleaning spaces/quotes/commas and so on
  • correcting typos per prepared mapping-lists
  • setting everything to upper/lower
  • mapping against target-words (detecting them as well as returning/linking the cluster-values and also prioritizing them)
  • creating a scoring-value against the matches
  • concat() of all matches

whereby the real order may differ to the provided listing and also be combined in some way and not always applied sequentially.

Like hinted above the main-work is not to apply the various measurement in one or two dozens statements else defining all the stuff. Getting a proper categorizing for 30% to 50% will be surely quite easy, the following 10% - to 20% will need more efforts and the costs to get 90% to 98% will definitely increase exponentially and you won't never reach 100%.

Much simpler as the above would be to include some mandatory fields within the incident data with pre-defined values in a dropdown. 

Venkatesh_5990
Contributor III
Contributor III
Author

Hi @marcus_sommer ,

Thanks for your update. But here you are using hardcode the text right.

Can we approach different way without hardcoding the text. Is it possible ?

marcus_sommer

Of course, it was meant without hard-coding the most things else using external maintained lists and rules - maybe within Excel - and loading them as mappings against the text/words.

The company-definitions are specific and would need to be done from your business-departments but the other things like the mentioned classifying/cleaning/preparing could happens against (commercial) dictionaries and rules respectively derived programming logic.

Just a further hint - instead of diving very deep in the cleaning-measurements by looking and fixing double-chars within the words or correcting very common mistakes by chars like p and b / t and d and so on you may also implement the LevenshteinDist - script and chart function | Qlik Cloud Help to check the similarity between the words.