In recent years we have seen tremendous growth in the generation of textual data online. Analyzing such textual data can prove to be extremely beneficial for businesses in today’s digital world which would allow them to make critical decisions. For instance, for a company selling a product, it is crucial to understand the consumer’s viewpoint, and therefore analyzing the ‘reviews’ can be a great starting point. However, this also leads to an intrinsic problem — How do we analyze such constantly generating huge volume of text and interpret their meaning?
Ideally, the solution would be to break down sentences and phrases into specific components and then analyze these components by using a technique such as Natural Language Processing(NLP). NLP comprises features such as sentiment analysis, entity recognition, syntax parsing, part-of-speech tagging, etc. that can help us break down unstructured data to a granular level and allow for deeper analysis. There is a handful of toolkit such as NLTK, SpaCy, Amazon Comprehend, Google Cloud NL API, etc. that allows to process and derive insights from texts.
Scenario: Imagine your organization leverages Qlik Sense’s data analytics platform for visually understanding data & key metrics and Amazon Comprehend for text analysis. Your requirement is to be able to analyze some textual data, understand sentiments & most importantly be able to visualize the processed texts and extract any hidden insights. How would you achieve this using the two?
This is where Qlik Sense’s augmented analytics capabilities come into play. Now, as a Qlik Sense application developer, you can integrate the Qlik data model with Amazon Comprehend using the native connector for Qlik Sense. This would allow you to send a dataset from a Qlik reload script or a chart expression to Comprehend and derive inferences back to Qlik.
Now before we focus on ‘sentiment analysis’ specifically, let us get some background behind using any 3rd-party ML connectors in Qlik Sense(including Amazon Comprehend for this specific use case). This would help you understand the things running behind the hood and set the base for using any analytic connector.
1. Analytic connections:
The first step to start communicating with any 3rd-party Machine Learning endpoint from Qlik Sense is to establish an analytic connection. This can be created in the Data load editor and is native to the Qlik Sense client.
2. Server Side Extension(SSE) functions:
After a connection has been created, the next step is to integrate Qlik’s data model with the 3rd-party ML platform using the connection. This would allow for:
sending data from Qlik’s data model & getting the inferences back using the load script
performing real-time calculations in chart expressions
To achieve the above two use cases, we rely on the SSE functions. SSE is used to extend the built-in Qlik expression library with functionality from external calculation engines. If you are not aware of the SSE syntax, you can read more about them here. Our focus in this blog would primarily be 2 functions:
ScriptEval — used with data load script. Here, you can send a single table to the ML model, and a single data table is returned. We can then use LOAD … EXTENSION statement to load the data back to Qlik.
ScriptAggrStr — used with chart expression. Here, we cannot consume a table returned from the SSE function call and only the first column returned will be used by Qlik Sense.
The entire workflow for any Machine Learning connection integration within Qlik Sense comprises of the below steps:
the SSE functions and parameters are first processed within Qlik Sense.
they are then converted into REST requests.
the REST requests are sent to 3rd-party ML models for processing.
ML models process the request and send inference back to Qlik.
The image below depicts the entire flow.
Analytic connections for machine learning endpoints must be enabled by a tenant administrator in the Management Console.
Need to have access to 3rd-party ML models that exposes REST-based API endpoints.
Alright, now that we have an understanding of the background and things required for us to get started with text analysis, let us deep dive into the steps.
1. Load Data: First, we will load our dataset, which is in a CSV file using a folder connection. The script in the Data Load editor looks like below. Please note we also add a new field ‘RowNo( ) as RowID’ in our table [Hotel_review]. The purpose of this is to create associations between this and the tables returned by the 3rd-party ML systems(comprehend in this case) so we can take advantage of Qlik’s unique associative engine during analysis.
2. Create connection: Since we plan to use Amazon Comprehend for text analysis, we will create a new connection in the Data Load editor. Amazon Comprehend provides the following 5 services for text analysis -
Since in this case, our goal is to do sentiment analysis, we will select the highlighted service from the dropdown. You will also need to provide Amazon-specific details for the connection (for details read here) and finally a name (‘Amazon_Comprehend_demo’).
3. Send data to Comprehend: Next, we will use the ‘Select data’ button from our connection to send a table and a field from Qlik Sense to the Amazon Comprehend system for sentiment analysis. The table should be the name of the table with source data that you have loaded into your app. In our case, the table is ‘Hotel_Review’ and the field is ‘Reviews’.
4. Load returned table: After a table & field name is sent from Qlik Sense to Comprehend, the available return table will automatically appear under the ‘Tables’ section(as seen below). When selecting the table, you can select or deselect the columns to load. In our case, we will select all the 5 fields under the ‘Sentiments’ table returned by Comprehend and click ‘Insert script’.
Below is how the script looks like. Similar to Step 1, we also add a ‘RowNo( ) as RowID’ field to maintain the associations and then reload our app. Note that when reload of an app occurs it will first expect to load the source data as a resident table and use this as input to the request made to Amazon Comprehend endpoints.
For the purpose of simplicity, let’s breakdown the SSE function visually.
Now quickly check the Data model viewer. We can see that the associations has rightly been made and everything is as expected.
5. Analysis: Our final step is to build a dashboard and do some analysis so we can understand the sentiments of the hotel reviews processed by Comprehend.
First, I want to know the count of each predicted sentiment category. So, I create a bar chart by using ‘Sentiment’ as a dimension and ‘Count(RowID)’ as a measure. Notice, how we can use a field returned by Comprehend system along with the available Qlik sense data to derive insights.
So, looks like there are a lot of ‘mixed’ reviews for the hotels. Out of curiosity, I wanted to know why there were so many mixed reviews about these hotels. Therefore, I created a table object with detailed reviews, titles and selected only ‘Mixed’ from my bar chart as a filter. The result is below:
The reason is that most of these reviews have both the ‘bad’ and ‘good’ context in the text. So, the sentiments are mixed. The ability to derive these kinds of insights using Qlik Sense is crucial so it has the right impact on your analysis.
I also wanted to visualize the predicted sentiments by the original ratings of the hotels. So, I decided to create a Mekko chart that would allow me to visualize ratings for each segment of sentiment. To do so, I use ‘Sentiment’ as a dimension, ‘Ratings’ as cells, and ‘Count(RowID)’ as a measure.
We can infer some things from here. For e.g. out of all the Positive sentiments, 71.3% were 5-star ratings, which aligns with our general understanding. The Negative sentiments are composed of ratings ranging between 2.5–2.9. So, they turned out to be negative.
Finally, let’s build a real-time sentiment analysis sheet in our Qlik Sense app. This is very interesting as it facilitates the following:
provides a user interface experience to input any text.
allows inferring sentiments in real-time.
To build the sheet, we will use the ‘Variable input’ object from Qlik’s dashboard bundle that can serve as a text field. We create a new variable called vText to be used with this variable input.
We then drag & drop the object to our sheet and the result is below:
Next, we need to pass the input text to Amazon Comprehend and get the result back in real-time. To do so, we will take advantage of SSE-based chart expression. Since we need to create an expression, we basically need an object that can show us ‘Measures’. The KPI object is a natural choice for this purpose. So, we drag & drop a KPI object and write our expression.
To enhance user experience, we will also display the result as an emoji placed inside a KPI object. The expression for passing the input text & deriving sentiment remains the same. However, we use a pick-match function to get the right emoji. Below is the expression.
After putting everything together, the final result can be seen below -
Here’s the dashboard in action.
The idea behind this blog was to give a starting point to Qlik Sense users who plan to integrate 3rd-party ML systems and do advanced analytics seamlessly. There are also certain limitations specific to the Amazon Comprehend & the connector, which you can read about here. In the next couple of blogs, we will extend this tutorial to some more interesting use-cases using the various analytics connector available in Qlik Sense SaaS.