Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Werapat_2538
Contributor
Contributor

How to understand Qlik Auto ML ( How different algorithm suggestion Linear Reg vs Catbosst Reg

I try to explore the Machine learning in Qlik auto ML and compare with normal use machine learning in python  what is the different thing 

Werapat_2538_0-1684202150869.png

 

So the first of all I try to apply to use  Qlik Auto ML by using   Regression  to predict the sales with my dataset after i upload the data  found that qlik show the experiment with algorithm many suggestion  ; and  rank to top algorithm to use Catbosst Reg to use  these algorithm has R2 around 98%   

second test I try to apply same dataset to the python and use library Sklearn for  use Linear  Regression  but found that the result of R2 just only 14%    

the Question is   algorithm suggestion in Qlik sense is suitable to apply right  as Catboost Reg?  that has a more reliable to apply  ?   

 

 

 

Labels (1)
1 Solution

Accepted Solutions
Kyle_Jourdan
Employee
Employee

Tree-based (including gradient boosted, like CatBoost) algorithms are very good at predicting on non-linear data patterns, where as linear regression is really only effective if the patterns in your data are linear.

Since CatBoost was selected as the best R2 score (and, as you can see in your screenshot, linear regression scored at 13%), your data likely contains non-linear patterns. Beyond that, CatBoost in particular excels on datasets that have large amounts of categorical data.

With that said, a tree-based/gradient boosted algorithm is likely the most effective for your dataset based on the predictors (features) in your data. You can see all the algorithms that meet this definition (CatBoost, XGBoost, LightGBM, and Random Forest) all score very well compared to linear-based algorithms (such as Linear Regression and SGD).

View solution in original post

2 Replies
Kyle_Jourdan
Employee
Employee

Tree-based (including gradient boosted, like CatBoost) algorithms are very good at predicting on non-linear data patterns, where as linear regression is really only effective if the patterns in your data are linear.

Since CatBoost was selected as the best R2 score (and, as you can see in your screenshot, linear regression scored at 13%), your data likely contains non-linear patterns. Beyond that, CatBoost in particular excels on datasets that have large amounts of categorical data.

With that said, a tree-based/gradient boosted algorithm is likely the most effective for your dataset based on the predictors (features) in your data. You can see all the algorithms that meet this definition (CatBoost, XGBoost, LightGBM, and Random Forest) all score very well compared to linear-based algorithms (such as Linear Regression and SGD).

Werapat_2538
Contributor
Contributor
Author

Thank you for your explanation  more clearly for whole algorithm that meet with my dataset. 😀