In any subscription-based business model, one of the primary goals is to retain its customers. Also, with the increase in market competition, it is crucial to identify ‘unhappy’ customers at an early stage so as to provide additional benefits and retain them. Churn prediction refers to determining which consumers are most likely to abandon a service or terminate their membership. To be able to predict churn gives businesses the necessary edge since getting new customers is sometimes more expensive than keeping old ones.
The application of Machine learning techniques to understand & predict customer churn isn’t something new. Several ML algorithms have been used in the field of telecom, banking, insurance, etc. to detect early churn signals. However, just relying on an ML algorithm’s output to understand whether a customer will churn or not isn’t really an optimal approach anymore. To have a thorough understanding of the churn analysis process, the need to amalgamate historical data(what happened in the past?) with the predicted outcome(what will happen in the future?) is paramount.
This is where Qlik Sense’s visual analytics platform serves as an effective solution. Now, using advanced analytics connectors within Qlik Sense SaaS, users can build a Machine Learning model in an AutoML platform, consume the predictions in Qlik Sense and use them along with the Qlik’s data model to take advantage of things such as:
perform what-if analysis
If you want to understand the entire workflow to integrate 3rd-party ML endpoints within Qlik Sense, I highly recommend going through this first part that I wrote a few days back. The generic workflow is depicted below.
In this specific tutorial, we are going to analyze customer churn behavior for a telecom company by building an end-to-end Qlik Sense app and leveraging both historical as well as predicted data. For building the Machine Learning model and hosting the endpoint, we will use the Amazon SageMaker platform. We will keep our focus on building the Qlik Sense app and not on developing the ML model.
Pre-requisite:To be able to use Analytics Endpoints the "Enable machine learning endpoints" parameter should be enabled in the Management Console.
Step 1: Load source data
First, let us load the source data into Qlik Sense using the Data load editor.
We will analyze our dataset in detail when we build our ‘descriptive’ dashboard but for now we know that we have 15 attributes and 3333 records that describe the profile of each customer of the telecom operator.
The last attribute, Churn, is known as the target attribute–the attribute that we want our ML model to predict to know if a customer will churn or not.
Step 2: Train a churn-prediction model & deploy the inference API
Our next step is to build the churn prediction model. The target is to classify each customer into either of the two categories — churn or not churn. Therefore, this is a binary classification problem. We will be leveraging SageMaker Autopilot that allows us to automatically build, train, and tune the best machine learning model based on our data without having to write much code.
Credit: Amazon Web Services YouTube
If you are just getting started with SageMaker Autopilot, here is a great video from AWS to help you understand the basics. I use the describe_auto_ml_job API to look up the best algorithm selected by the SageMaker Autopilot job.
Finally, we will create our model based on the best candidate (automl-churn-28–18–16–29r2UGiyXI-011–5e61e1c5) & deploy it to a hosted endpoint. When the endpoint is ready, the endpoint status will change to ‘InService’ like below.
To make it easier for you to learn about how I trained & deployed my model in SageMaker, I will attach my Python notebook along with this blog.
Step 3: Send data from QS to the ML endpoint for prediction
Now that we have the model endpoint ready for inference, we will send all the fields required by the model from QS to predict if a customer would churn or not. Please note that we will only send 14 attributes and exclude the last one(churn) since we want that prediction to be made by the model.
To do so, we go to the Data load editor and create a new SageMaker connection like below. You can read more about creating a new connection here.
You should now see the SageMaker_Autopilot_churn name in your list of connections. Now, click on ‘select data’ to start sending your data from QS data model to SageMaker.
Click on ‘Insert script’ to get the script in the editor.
Please note how I have changed the raw script that we got from our connection to include all the 14 fields to be sent to our endpoint. Like our previous use case, we use RowNo( ) here as a field to associate the source data & the returned prediction table.
Here’s a peek at our data model after the data is loaded.
Let’s quickly check what is returned by the ML model based on the data we sent from Qlik Sense.
So, for every customer row, we have a predicted_label field that shows whether the customer will stay or leave. We also have the individual class probabilities for deeper analysis.
Step 4: Building the QS analytics app
Our final step is to build a Qlik Sense app so we can perform our analysis and present it to the stakeholders.
We will segregate the app into 3 sheets as shown below each serving its purpose:
Descriptive Analysis sheet:
Goal: This sheet will help us understand the historical source data & allow for detailed analysis.
First, I want to understand the distribution of a couple of features and since we have 15 of them, I won’t visualize all of them but highlight the ones that my stakeholders are interested in. In terms of visualization, I will use a container object and add the distributions as histograms like below.
We can see that most of the fields (Day mins, Eve mins, etc.) have a normal distribution while Cust Serv Calls appear to be positively skewed.
I also wanted to highlight how the target attribute(Churn) was distributed since it is important to know the reality of how many customers can actually churn. Looks like 14.49 % of the customers did churn.
Next, since our data is high-dimensional (10+ features) and I want to enable detailed analysis of individual customers, choosing a visualization that works well with multidimensional data was crucial. I decided to go with a Parallel coordinate plot extension that I built sometime back.
Finally, putting everything together here’s our Descriptive dashboard.
Let’s do a simple analysis. I want to compare a customer who wants to churn with one that does not. So, I randomly select two such rows.
This view allows us to easily compare all the 11 numerical attributes. So, looks like both of these customers are new customers(observe acc_length=1). The orange line represents Churn=‘True’ and the cyan represents Churn=‘False’. For most of the features, we see the lines in a criss-cross form which helps us understand how these 2 customers differ. One thing that stands out is how the customer who churns makes 5 customer service calls in just a day and the other one makes 1. This gives us an indication that the orange customer might have faced some issues with the operator.
Predictive Analysis sheet:
Goal: This sheet will help us understand the churn predictions that we made using SageMaker & our model’s performance.
Let’s see how our overall predictions looks like.
The predictions are almost similar to the ground truth.
Next, I want to visualize the churn predictions by each state so the telecom operator can keep their focus on those ‘risky’ states.
This is a great example of how using Qlik’s associative property, we were able to integrate both historical and predicted data.
It is also important to understand what mistakes the ML model makes. False negatives are the most problematic because the model incorrectly predicts that a churning customer will stay. The best way to evaluate our model would be to draw a confusion matrix like below.
We have 17 such cases. We can select this ‘17 block’ from our matrix and perform detailed analysis of the special cases by analyzing both descriptive & predictive sheets.
What-If Analysis sheet:
Our final piece is the ‘What-if’ scenario builder. Personally, I love this native capability of Qlik Sense as it allows us to look beyond traditional analysis. Also, note how easy & quick it is to build this. I have used a custom object called ‘variable input’ that allows me to include sliders, dropdown & text fields.
The ‘Will the customer churn?’ is a KPI object and uses a Server side extension function ScriptAggrStr()as a chart expression which allows us to get predictions in real-time by passing the values dynamically from the input boxes. Here’s the expression -
Let us do quick & simple what-if analysis. From our Descriptive sheet, we noted that the field Cust Serv calls might be an important one. After all, a happy customer doesn’t need to call customer service. To really prove that correlation, let’s play around.
And looks like the hypothesis makes sense! However, please note that this is just one factor. I tried increasing the Int’l Mins to 15 instead of 12 and even though I had a lot of customer service calls, the prediction was False.
So, maybe providing more Int’l Mins to the customer would be a great idea to retain them. This kind of insight can help businesses dealing with churn to really understand the pitfalls and improvise on them even at a granular level.
That brings us to the end of this exciting blog. The tutorial is a detailed one as the whole idea was to allow Qlik users to quickly adapt to these capabilities and understand the process end-to-end. Let me know what you think!