Skip to main content

Qlik AutoML: Training and Prediction data guidelines

No ratings
cancel
Showing results for 
Search instead for 
Did you mean: 
KellyHobson
Support
Support

Qlik AutoML: Training and Prediction data guidelines

Last Update:

Mar 22, 2023 11:06:43 AM

Updated By:

Sonja_Bauernfeind

Created date:

Mar 22, 2023 10:09:35 AM

Qlik AutoML is a tool within Qlik Cloud where you can quickly train and deploy models, and the make predictions against said models. In this article, we address best practices for preparing training datasets for ML experiments and/or apply datasets for generating predictions.

 

Guidelines

  1. For column names, use camelCase or column names without spaces or punctuation.
  2. No special characters in column names.
  3. When working with Excel data, remove all formatting such as bold/italics, borders, currency, or color formats.
  4. Make sure data types are consistent between training and prediction datasets. It may be worth checking in Qlik Catalog to see if datasets are profiling the data type you are expecting.
  5. Qlik AutoML only works with structured, tabular data. Any flat file which can be uploaded and profiled in Qlik Cloud can be used by AutoML. Based on experience, we have best results with CSV, QVD, XLSX formats.
  6. AutoML does not support sentiment analysis. This would require a third-party service (such as Amazon Comprehend) to generate structured data points for AutoML to use.  You can use Comprehend in Qlik Sense directly using our connector and then use that output as a feature in AutoML.
  7. For multi-table files such as Excel, only the first sheet will be used for a table.
  8. Date columns are currently treated as categorical feature type. Feature engineering of date columns to numeric type should be done prior to using the dataset in an ML experiment.
  9. Be aware of dataset size limits based on your tenant type.
  10. For null data, if a column contains more than 50% null values it will be dropped.

    If greater than 50%:
    For numeric type: it uses the mean.
    For categorical type: uses value 'other'

We will continue to update this list as we encounter other issues related to data used in Qlik AutoML.

 

Environment

Qlik AutoML 

 

The information in this article is provided as-is and to be used at own discretion. Depending on tool(s) used, customization(s), and/or other factors ongoing support on the solution below may not be provided by Qlik Support.

Related Content

How To Get Started with Qlik AutoML

Labels (1)
Version history
Last update:
‎2023-03-22 11:06 AM
Updated by: