Qlik Community

Integration, Extension & APIs

Discussion board where members can learn more about Integration, Extensions and API’s for Qlik Sense.

Announcements
IMPORTANT security patches for GeoAnalytics Server available to download: READ DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
Stankevix
Contributor
Contributor

Pytools - Data Science for Everyone Qlik Sense

Hi guys,

I started developing a Data Science project within the company I currently work for. Looking for open source solutions and possible integrations with Qlik I found Pytools. This Server Side Extension provides algorithms for advanced analysis in Qlik Sense, making data science algorithms more accessible for business areas.

The Qlik Extension (SSE) was built using a series of Python algorithms intended to provide a set of functions that can be used as expressions in Qlik Sense. Because the project is open source, customization and creation of new algorithms is open to everyone as needed.

Along with this project, I am applying the concept of Data Literacy with the focus of teaching business areas about the importance of reading and writing data. This way, company employees can make more confident, data-driven decisions. Improving analytical, statistical and analytical skills has been one of the biggest challenges so far.

 This release includes the following implementations::

  • Supervised Machine Learning: Implemented using scikit-learn (Python library). This SSE implements full machine learning flow for data preparation, training modeling and assessment to make predictions. Also, models can be interpreted using Skater.
  • Unsupervised Machine Learning: Also implemented using scikit-learn.
  • Segmentation: implemented using HDBSCAN, high performance algorithms for more exploratory data analysis.
  • Forecasting: Implemented using Facebook Prophet, a modern library that facilitates the generation of forecasts in high quality and performance.
  • Seasonality and holidays analysis: also uses the Facebook Prophet algorithm.
  • Correlation: Implementation Using Pandas.

About the Setup process, development and presentation.

  1. The Setup for PyTools on the local Machine, perform extension testing, study and customize available algorithms. In this step it is important to install python and its compatible packages according to versioning (pystan, pandas, scipy, prophet etc)
  2. PyTools configuration on local Qlik Sense server, initially in development environment and then in production environment.
  3. Creating relational models, developing metrics, facts and dimensions in SQL Server and Qlik Sense meeting business demands.  
  4. Development Dashboards with standard Qlik functionality and use of Pytools extensions
  5. Development of a Qlik Mart for Data Load Optimization in Created Apps (Backlog).
  6. Using Nprinting by Scheduling Dashboard Triggers for User Groups(Backlog).

Algorithms and its expressions:

Clustering
This algorithm uses the following expression

PyTools.Cluster([ID],$(vMetrica)& ';' & $(vNMetrica2), 'scaler=quantile,min_cluster_size=3,min_samples=2')

 

clustering.png

(Image does not represent a real scenario due to data confidentiality)

 

Linear correlation

Correlation algorithm uses the following expression

Pytools.Pearson($(vMetrica1), $(vNMetrica2))

coeficiente.png

 

Dashboard - Clustering

Set up a clustering dashboard using HDBSCAN and its parameters.

 PainelQlik.png

 

 

(Image without data due to data confidentiality)

 

Recommendations

  • Maintain a steady pace of study in on-site or online courses (Edx, Coursera etc)
  • Create documentation about your developments, especially when working with Machine Learning and Clustering. 
  • To be committed as the business areas about the project, its values and gains
  • Use the Qlik community and communicate with other devs. It had helped a lot to grow professionally.
  • Listen to Data Science podcast (Lab Head, Data Skeptic, Data Pizza)


Doubts about development or material need I can share a PDF File with some instructions.

I post this as a outreach to the Community and find other data scientists who want to use this SSE or are interested in sharing experiences with this tool.

All SSE development was done by Nabeel Oz. On the GitHub link you can find the project's base setup in English https://github.com/nabeel-oz/qlik-py-tools


Remember, using this project as a base is a great way to start a Data Science project. With great base algorithms, you can customize to your needs and work with Data Literacy education within the enterprise environment without a large upfront investment.

Best regards and  Qlik for the win.

2 Replies
wallace0834
Contributor
Contributor

Nabel,

 

I am receiving this error message when I attempt to install pytools in multi-node environment.  Please see below for the error message.  I have installed the pre-requisites software mentioned in your document.

 

'activate' is not recognized as an internal or external command,
operable program or batch file.
The system cannot find the path specified.
python: can't open file '__main__.py': [Errno 2] No such file or directory
Press any key to continue . . .

 

CintiaK
Contributor II
Contributor II

Hello! Thanks for this post. I'm starting also a new data science projet in the company I'm working and after some research I found this extension a good way to do data science with Qlik. I'm currently trying to implement my own fonctions within the pytools extension but I'm having a hard time finding how to develop new functions. I already have a python file with some forecasting that I want to use, but I'm having trouble getting this script in the right "shape" so it can fit the extension and I can use it directly doing Pytools.MyFonction if you know what I mean. Any ideas ?