Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry

Data Science algorithms implemented as a Python SSE

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
Nabeel_Asif
Employee
Employee

Data Science algorithms implemented as a Python SSE

Last Update:

Apr 2, 2021 4:25:34 AM

Updated By:

Nabeel_Asif

Created date:

Jan 8, 2019 11:06:44 PM

Project page: https://github.com/nabeel-oz/qlik-py-tools

Qlik's advanced analytics integration provides a path to making modern data science algorithms more accessible to the wider business audience. This project is an attempt to show what's possible.

This repository provides a server side extension (SSE) for Qlik Sense built using Python. The intention is to provide a set of functions for data science that can be used as expressions in Qlik.

Sample Qlik Sense apps are included and explained so that the techniques shown here can be easily replicated.

The implementation includes:

  • Supervised Machine Learning : Implemented using scikit-learn, the go-to machine learning library for Python. This SSE implements the full machine learning flow from data preparation, model training and evaluation, to making predictions in Qlik. In addition, models can be interpreted using Skater.
  • Unsupervised Machine Learning : Also implemented using scikit-learn. This provides capabilities for dimensionality reduction and clustering.
  • Deep Learning : Implemented using Keras and TensorFlow. This SSE implements the full flow of setting up a neural network, training and evaluating it, and using it to make predictions. Deep Learning models can be used for sequence predictions and complex timeseries forecasting.
  • Named Entity Recognition : Implemented using spaCy, an excellent Natural Language Processing library that comes with pre-trained neural networks. This SSE allows you to use spaCy's models for Named Entity Recognition or retrain them with your data for even better results.
  • Association rules : Implemented using Efficient-Apriori. Association Rules Analysis is a data mining technique to uncover how items are associated to each other. This technique is best known for Market Basket Analysis, but can be used more generally for finding interesting associations between sets of items that occur together, for example, in a transaction, a paragraph, or a diagnosis.
  • Clustering : Implemented using HDBSCAN, a high performance algorithm that is great for exploratory data analysis.
  • Time series forecasting : Implemented using Facebook Prophet, a modern library for easily generating good quality forecasts. Now with the ability to use multiple regressors as input.
  • Seasonality and holiday analysis : Also using Facebook Prophet.
  • Linear correlations : Implemented using Pandas.

For more information refer to the project page on GitHub.

For more information on Qlik Server Side Extensions see qlik-oss.

Disclaimer: This project has been started by me in a personal capacity and is not supported by Qlik.

Comments
maxsheva
Creator II
Creator II

Hi @Nabeel_Asif ,

Thanks for suggestion. I have grab both app and data file but result is with the same error. 

Capture1.JPG

I suppose there could be some missed or incorrectly installed Python library or other related to extension issue.

Could you please check log of Qlik-Py-Start

0 Likes
Nabeel_Asif
Employee
Employee

If you're still getting the error it looks like you're not using the latest version of the SSE.

The Qlik load script fails saying that there is no field called 'ds' at the point where the SSE returns the results. There is definitely a field called 'ds' returned in release 4.0 when you pass load_script=true to the Prophet function. This was not the case with release 3.9 and earlier.

maxsheva
Creator II
Creator II

@Nabeel_Asif,  many thanks!

It works with a new version of the SSE.

Let me adapt a script for another data and I will provide a feedback.

 

Much appreciated!

maxsheva
Creator II
Creator II

Hi @Nabeel_Asif ,

I have tried to integrate own data into solution. I am able to execute and get forecast results.

However I cannot understand how 'freq' parameter is working e.g. freq=D (W,M,MS,Y)

I see yhat forecast is the best when freq=D but it is still less than 20% from real numbers. For sure I may multiply result * 1.2 but wondering whether any option to adjust it using built-in Prophet parameters?

0 Likes
dubdev
Contributor
Contributor

Hi @Nabeel_Asif , I'm new to use analitycs with Qlik and python. Are there some functions in your extension for binary classification? Like kNN or SVM/SVC and others. Is it possible to realise binary classification with stock function of this extention? I'll be gratefull for advise.

0 Likes
Nabeel_Asif
Employee
Employee

@dubdev , yes this SSE has functions that support both classification and regression. Most of the algorithms from the scikit-learn library are supported. 

For usage information please head over to the project's GitHub repository: https://github.com/nabeel-oz/qlik-py-tools 

Nabeel_Asif
Employee
Employee

@maxsheva , the Freq parameter is based on the granularity of your data so there is only one correct option for a given dataset, for e.g. D if you have daily data. 

The forecast will not align perfectly with historical values as that would be overfitting the model to a sample of data. However, there are a few ways to adjust the output explained here: https://github.com/nabeel-oz/qlik-py-tools/blob/master/docs/Prophet.md

evanplancaster
Contributor III
Contributor III

@Nabeel_Asiffirst off, this SSE is amazing! Very well-documented, and very well-implemented. Thank you so much for it!

I have two questions:

1) Regarding Prophet forecasting, are there any plans to incorporate additional regressors into the mix? https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#additional...

2) For those who might not feel comfortable enough with Qlik's load scripting to want to go to the trouble of understanding how to feature engineer, train, cross-validate, tune hyper-parameters, and select the best model all inside Qlik, is there a way to do all that "grunt work" outside of Qlik, and then, having deployed the model in the proper location, use the sklearn.Predict function you've developed to pass a dataset in Qlik to that model? This would be a great feature to have for those of us who already have models out in production that we built using other tools and who don't want to rebuild them just so we can see the results in Qlik. (And yes, I know that we could just take our results from a pre-built model and shove them in a table on a database somewhere and pull them into Qlik, but the on-the-fly capabilities of SSEs are what we're really after here.)

Again, great job on this, and I'm so thankful to see that you are actively enhancing it and helping us all get it up and running!

Nabeel_Asif
Employee
Employee

Hi @evanplancaster , thanks for the compliments.

I did think about implementing the additional regressors option for Prophet, but felt restricted by a current limitation of SSEs, which is that a function cannot have a variable number of arguments. I guess I could create a new SSE function that allows for just one additional regressor, or come up with a scheme for passing multiple regressors using concatenation. I'll have a think.

On your second question, the models built using the SSE have a bit more in them than a standard sklearn model. They consist of a sklearn pipeline that needs to handle pre-processing (OHE, scaling, etc.), evaluation metrics from cross-validation, and meta-data to interpret features, their data types and how they need to be pre-processed. As I type this, I realize it should be possible to take an existing sklearn pipeline and add metadata to it so the model becomes easier to use with Qlik. So you've given me two things to think about!

swarnendu
Creator II
Creator II

hi @Nabeel_Asif  i get some error

Getting error from "Qlik-Py-Init.bat"

1. ephem ... error

2. Building wheel for hdbscan (PEP 517) ... error

3. Running setup.py install for wordcloud ... error

Getting error from "Qlik-Py-Start.bat"

1. Traceback (most recent call last):
File "__main__.py", line 16, in <module>
import ServerSideExtension_pb2 as SSE
File "C:\Users\nexuser2\Documents\Qlik\Sense\Extensions\qlik-py-tools-master_2\qlik-py-tools-master\qlik-py-env\generated\ServerSideExtension_pb2.py", line 7, in
from google.protobuf import descriptor as _descriptor
File "C:\Users\nexuser2\Documents\Qlik\Sense\Extensions\qlik-py-tools-master_2\qlik-py-tools-master\qlik-py-env\lib\site-packages\google\protobuf\descriptor.py", line 47, in
from google.protobuf.pyext import _message
ImportError: DLL load failed: The specified procedure could not be found.
Press any key to continue . . .

 

Thanks and Regards,

Swarnendu Haldar.

Version history
Last update:
‎2021-04-02 04:25 AM
Updated by: