Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Sep 9, 2021 7:29:45 AM
Sep 9, 2021 7:29:45 AM
This article shows how the Sagemaker connector in Qlik Application Automation can be used to leverage the AutoML features from Sagemaker AutoPilot. In this example we are using data in a CSV file to train a model using AutoML. Next we use the newly created endpoint from Sagemaker to get predictions on data in a Google Sheet. We write the predictions to a new column in the same Google Sheet and we visualize this data in a Qlik Sense app.
Next, we make the automation available using a button on a sheet in the Qlik Sense app and we add an input to the automation so that the user can vary the input data, get new predictions and immediately see the visualization of these new predictions in Qlik Sense.
Prerequisites
Create an AutoML job
We have a CSV file with input training data stored in a bucket on S3.
We are using this dataset for demo purposes: https://datahub.io/machine-learning/bank-marketing
We're using this file to create an AutoML job in an automation using the block "Create AutoML job from S3" in the Sagemaker connector. We're using the same S3 bucket for output:
Getting the status of the AutoML job
Once the AutoML job is created, we'll use the "get AutoML job" block to fetch its status:
It can take minutes or hours for the job to complete, we're running the Automation manually every hour or so until the AutoMLJobStatus in the response is "Completed" (bottom of screenshot):
Getting predictions from the endpoint
We are taking note of the name of the endpoint that was created (from the "Get AutoML job" block output, or we can use the lookup feature), and we use the block "Invoke endpoint" to get prediction on one row of new CSV input data:
Saving the predictions
In the following example we have input data in a Google Sheet, we're looping over the lines in the sheet and invoking the endpoint for each line:
Note that we apply the "csv" formula on the row data because we have set the content-type to text/csv.
We write the prediction and probability back to the Google Sheet in a separate column Q and R:
We use the following json path to calculate the correct line number, needed in the input "Start Cell". We take the index of the loop (which starts with 0) and we add 2 because the first line number in Google Sheet is 1 and this contains a header line so we want to start writing on line 2:
Visualizing the data
Finally we are doing a reload of a Qlik Sense app that visualizes the data from the Google Sheet:
Note that we set up a datasource in Qlik Sense pointing to the same Google sheet file as the Automation is using, in order to visualize its content:
What-if simulations
We can add an input in the Automation, to allow the user to do "what-if" simulations. In this example, we're allowing the user to change one variable, which is the "number of days since last contact":
In the Automation, we replace one column of the data from the Google Sheet with the input that the user provided, before we send it to Sagemaker to get a prediction. We do this using a variable of type "object". First we set the variable equal to the line from the Google Sheet, and next we replace one key with the input from the user:
Same as before, we write the prediction to a new column in the Google Sheet (as well as the user input). This is what our Google Sheet will look like:
And at the end of the automation, again we reload the Qlik Sense app.
Finally, we're adding a button on a Sheet in the Qlik Sense app, that we link to the Automation.
Now we support a full cycle of a What-if simulation:
Download
See attachments to download the automation template (json file) and the Google Sheet file used in this article.
In the automation editor, right mouse click on the canvas (center pane) and select "upload workspace". Next, select the json file to import the template of the automation.