Data Science algorithms implemented as a Python SSE

Nabeel_Asif · Apr 2, 2021 4:25:34 AM

Project page: https://github.com/nabeel-oz/qlik-py-tools

Qlik's advanced analytics integration provides a path to making modern data science algorithms more accessible to the wider business audience. This project is an attempt to show what's possible.

This repository provides a server side extension (SSE) for Qlik Sense built using Python. The intention is to provide a set of functions for data science that can be used as expressions in Qlik.

Sample Qlik Sense apps are included and explained so that the techniques shown here can be easily replicated.

The implementation includes:

Supervised Machine Learning : Implemented using scikit-learn, the go-to machine learning library for Python. This SSE implements the full machine learning flow from data preparation, model training and evaluation, to making predictions in Qlik. In addition, models can be interpreted using Skater.
Unsupervised Machine Learning : Also implemented using scikit-learn. This provides capabilities for dimensionality reduction and clustering.
Deep Learning : Implemented using Keras and TensorFlow. This SSE implements the full flow of setting up a neural network, training and evaluating it, and using it to make predictions. Deep Learning models can be used for sequence predictions and complex timeseries forecasting.
Named Entity Recognition : Implemented using spaCy, an excellent Natural Language Processing library that comes with pre-trained neural networks. This SSE allows you to use spaCy's models for Named Entity Recognition or retrain them with your data for even better results.
Association rules : Implemented using Efficient-Apriori. Association Rules Analysis is a data mining technique to uncover how items are associated to each other. This technique is best known for Market Basket Analysis, but can be used more generally for finding interesting associations between sets of items that occur together, for example, in a transaction, a paragraph, or a diagnosis.
Clustering : Implemented using HDBSCAN, a high performance algorithm that is great for exploratory data analysis.
Time series forecasting : Implemented using Facebook Prophet, a modern library for easily generating good quality forecasts. Now with the ability to use multiple regressors as input.
Seasonality and holiday analysis : Also using Facebook Prophet.
Linear correlations : Implemented using Pandas.

For more information refer to the project page on GitHub.

For more information on Qlik Server Side Extensions see qlik-oss.

Disclaimer: This project has been started by me in a personal capacity and is not supported by Qlik.

qlikssewarrior · ‎2020-06-23

@Nabeel_Asif Thank you Nabeel for creating such a useful Data Science SSE. I have a software engineering background, but I am brand new to Qlik, so this project has been serving me as a guiding light as I create a similar SSE for a project at my work. I do have a couple of optimization questions related to the Forecasting SSE.

1 - I see that in your sample app, you have separate measurements (each with their own Qlik expression calling the backend SSE) for the forecast, lower and upper limits. Is it possible to combine all three forecasting data points in a single SSE call response, and display them as three separate measurements on the chart? Qlik documentation mentions tensor function type that can return multi-column rows back to Qlik, but I am not clear on how I can retrieve the multi-column rows through the Qlik expression and parse, access and display individual column values as separate measurements on a chart. Can you point me to any documentation or example app/code that explains this?

2 - Is there a graceful way of notifying the Qlik user about backend SSE failure/exception (for example, due to bad data)? Something like sending a "response status" and "response detail" string back to the Qlik app, so the app developers can monitor the status of the SSE call, and react accordingly if there is an exception processing the SSE request.

Thank you!

Haider

Nabeel_Asif · ‎2020-06-25

@qlikssewarrior, thanks and glad you've found the project useful.

1. On the ability to return multiple columns, this is only possible through the load script at the moment. You can see an example for this here. For chart expressions we can only return a single column. Even if you sent back a string concatenating the forecast, lower and upper limits, you'd need to wrap the SSE expression in a native Qlik function like SubField, which did not work the last time I checked. You could explore caching the results on the SSE side and then avoiding computation if the inputs match a previous request.

2. Here again there is more control when calling the SSE through the load script. If you raise an Exception with a custom message it will be displayed in the load script log. However, error messages for chart expressions cannot be controlled by the SSE. The approach I take is to setup conditional expressions for the charts that use SSE functions to avoid them being triggered with unwanted selections. I then use buttons that let the user control when to run the external calculation.

If you have any feedback or enhancement requests for the SSE protocol itself, please do raise them on the Qlik OSS repository here: https://github.com/qlik-oss/server-side-extension/issues

qlikssewarrior · ‎2020-06-29

@Nabeel_Asif Thanks for your response. I was actually able to setup a multi-column response via the SubField function. So basically adding a measure that calls the SSE to retrieve the | concatenated string, and then adding measures that reference the SSE calling measure, and use the SubField function to parse specific pieces off of the SSE response. I confirmed the SSE is only called once to retrieve all columns. That is a huge performance gain for us, as we wanted to use up to 5 response columns for things like Statistical Process Control charts.

Also, I like your suggestion for error handling - to add all conditional statements and doing data validation checks before sending the request to the SSE.

Nabeel_Asif · ‎2020-06-29

That's interesting! I will try using SubField again. Last time I did, I think it generated one SSE call per row of data in the visualisation.

Yevhenii_Senko · ‎2020-06-30

Hi @qlikssewarrior ,

Thanks for sharing the approach with SubField.

As far as I understood it was done in the UI.
Could you please post an example with an expression?

Thanks!

qlikssewarrior · ‎2020-06-30

@Yevhenii_Senko Yes, the parsing is done in the UI using the SubField function. Here is some more detail.

Qlik measure that fetches the data from the SSE:

Measure label: SPCTest

Measure expression: DataScienceAPIs.SPC([Dates], [Measures],'ArgumentsList')

SSE response structure (Python code): responseValue = str(spc[i]['lowerControlLimit']) + '|' + str(spc[i]['center']) + '|' + str(spc[i]['upperControlLimit'])

Qlik measures that parse pieces off of the SSE response:

Measure label: lowerControlLimit

Measure expression: subfield(SPCTest, '|',1)

Measure label: Center

Measure expression: subfield(SPCTest, '|',2)

Measure label: upperControlLimit

Measure expression: subfield(SPCTest, '|',3)

prerakdt · ‎2020-08-24

Hi @Nabeel_Asif ,

Is there any way we can use this extension for Qlik Business? How to configure this extension in Qlik Business?

Nabeel_Asif · ‎2020-08-24

Hi @prerakdt , Qlik Sense Business doesn't provide a way to configure Server Side Extensions at the moment. However, you could use Python/R on a local machine with Qlik Sense Desktop authenticated against Qlik Sense Business. The advanced analytics integration would only work your local machine in this case.

You can download Qlik Sense Desktop from your user settings page on the Cloud Hub.

saniyabubere06 · ‎2023-11-20

Hii @Nabeel_Asif I am getting this error

(qlik-py-env) C:\ProgramData\Qlik\Sense\qlik-py-tools-8.1-edited\qlik-py-env\core>python __main__.py
Traceback (most recent call last):
File "C:\ProgramData\Qlik\Sense\qlik-py-tools-8.1-edited\qlik-py-env\core\__main__.py", line 18, in <module>
import ServerSideExtension_pb2 as SSE
File "C:\ProgramData\Qlik\Sense\qlik-py-tools-8.1-edited\qlik-py-env\generated\ServerSideExtension_pb2.py", line 33, in <module>
_descriptor.EnumValueDescriptor(
File "C:\ProgramData\Qlik\Sense\qlik-py-tools-8.1-edited\qlik-py-env\Lib\site-packages\google\protobuf\descriptor.py", line 796, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

How will I resolve this error? @Nabeel_Asif

Data Science algorithms implemented as a Python SSE

Data Science algorithms implemented as a Python SSE

AAI

Advanced Analytics Integration

Data Science

machine learning

Python

SSE