Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I try to explore the Machine learning in Qlik auto ML and compare with normal use machine learning in python what is the different thing
So the first of all I try to apply to use Qlik Auto ML by using Regression to predict the sales with my dataset after i upload the data found that qlik show the experiment with algorithm many suggestion ; and rank to top algorithm to use Catbosst Reg to use these algorithm has R2 around 98%
second test I try to apply same dataset to the python and use library Sklearn for use Linear Regression but found that the result of R2 just only 14%
the Question is algorithm suggestion in Qlik sense is suitable to apply right as Catboost Reg? that has a more reliable to apply ?
Tree-based (including gradient boosted, like CatBoost) algorithms are very good at predicting on non-linear data patterns, where as linear regression is really only effective if the patterns in your data are linear.
Since CatBoost was selected as the best R2 score (and, as you can see in your screenshot, linear regression scored at 13%), your data likely contains non-linear patterns. Beyond that, CatBoost in particular excels on datasets that have large amounts of categorical data.
With that said, a tree-based/gradient boosted algorithm is likely the most effective for your dataset based on the predictors (features) in your data. You can see all the algorithms that meet this definition (CatBoost, XGBoost, LightGBM, and Random Forest) all score very well compared to linear-based algorithms (such as Linear Regression and SGD).
Tree-based (including gradient boosted, like CatBoost) algorithms are very good at predicting on non-linear data patterns, where as linear regression is really only effective if the patterns in your data are linear.
Since CatBoost was selected as the best R2 score (and, as you can see in your screenshot, linear regression scored at 13%), your data likely contains non-linear patterns. Beyond that, CatBoost in particular excels on datasets that have large amounts of categorical data.
With that said, a tree-based/gradient boosted algorithm is likely the most effective for your dataset based on the predictors (features) in your data. You can see all the algorithms that meet this definition (CatBoost, XGBoost, LightGBM, and Random Forest) all score very well compared to linear-based algorithms (such as Linear Regression and SGD).
Thank you for your explanation more clearly for whole algorithm that meet with my dataset. 😀