STT - Using AutoML in Qlik Sense

Troy_Raney · Apr 14, 2023 6:58:21 AM

Environment

Qlik Cloud
Qlik AutoML

Transcript

Hello welcome to the April edition of Techspert Talks. I'm Troy Raney and I'll be your host for today's session. Today's Techspert Talks session is Using AutoML in Qlik Sense with their own Kelly Hobson. Kelly why don't you tell us a little bit about yourself?
Hey Troy, thanks for having me. My name is Kelly Hobson. I'm a Senior Technical Support Engineer at Qlik and I currently support Qlik Replicate which is one of our Data Integration tools, and also Qlik AutoML. I've been at Qlik for just about 2 years now.
Great, thanks. So, today we're going to take a look at what Qlik AutoML can do; how to set it up; and some important details to keep in mind. Now Kelly, for those of us who are kind of new to the product, could you explain what AutoML is?
Sure. AutoML is a no-code Automated Machine Learning tool where you can easily create predictive analysis. So, Qlik AutoML generates Machine Learning models from historic or training data.
Okay.
Which can then be used to make predictions on current data.
Nice.
With the integration with Qlik Cloud, it allows you to explore the data and get real-time predictions within Qlik Sense apps.
Well, it all sounds very cool; and now you've already set up a demo. Can you explain what this kind of demo you’re going to be showcasing for us today?
Today I'll be talking about the Iris data set. This is a data set that is well known in data science classes or demos that can be found on Kaggle or even just a Google search to download the data set; and the data is four measures. So, pedal length, pedal width, sepal length, sepal width, and three types of Iris flowers that it's predicting. So, based on those four measures, can we classify the type of Iris?
Right.
It's fairly simple; there's just four measures; one Target variable. So, it's a very good data set to explain and to learn more about how to use the AutoML tool.
Right. I like this concept of a model to apply this on AutoML, because normally when I think of predictive analysis; it's always trying to determine between A or B. Like, is it going to work or is it not? And this is more complicated than that as a data set. It's 3 options instead of just 2. I like that concept.
That's right. So, we'll see that AutoML will distinguish that it's multi-class, and then be able to pick the correct models for that type of scenario.
So, let's jump in. Can you show us what the app looks like that you've put together?
Sure. So, you'll see this is all Qlik Sense on Qlik Cloud, all browser-based. On the right-hand side, we have 2 graphs. The top one is Petal-Measures, the lower is Sepal-measures; the x-axis is the Pedal-length, Pedal-width for the y-axis; and same for the bottom. And then data points are the records for the 3 types of flower.
Okay. Could you explain what those slider tools are in the middle?
Sure. Here in the middle, some slider tools that will take input for the measures. So, you can provide fresh or live-data that then will produce predicted output where we're seeing the predicted Setosa, predicted Versa color, predicted Virginica values, based on these 4 inputs.
Okay. Just so I understand, you’ve got the historical data that the predictive analysis is based on (on the right), you input current data with those sliders. Basically, you're trying to measure a flower you're looking at, and letting the app guess what type of flower it is; and that's making that prediction and that's the percentage?
That's correct.
Okay.
Yep, that's a great explanation. And for example, our current prediction is high for the type Sentosa; which makes sense based on those inputs. But if we do start changing our values a little bit higher for length and width, and then also a bump up the Sepal length; we can see that it's now predicting at a high probability for the Versa color type. So, the nice thing about having it side by side is that users that may not know much about how the model was generated or used the tool, can still explore the data with AutoML. So, that's something I think is a big differentiator with this tool.
Okay. Now it's really cool. What are the steps to build an app like this that uses AutoML?
To get to this place, there are some steps we have to take to stage the data in Qlik Cloud.
Okay. So, we're here on your Hub in the Analytics Service section. Where do we go from here?
The first place to start is having a good training data set.
Okay.
And in this case, we (you know) skipped some of that process, because Iris data set is something that's out there. It's already clean. In the real world, there's some work to be done to get a good solid training data set. And I'll switch on to the type of Data. The Iris data set is a CSV file and I've uploaded it to Qlik Cloud already. I'll just show what it looks like profiled and Qlik Catalog. We have the Sepal-length, width, Pedal-length, Pedal-width. The nice thing about this training data set is it's very balanced, 50 records for each type.
Okay.
You could also look at the data you have these numeric inputs and then the variety is a string value.
Right. So, it's a pre-built data set; they basically have gone and measured 50 different flowers of each type, and that's your data. It's very even but it's also simple; like we've just measured four things on each of those flowers, and we've got this set?
That's right. So, next we would come here to this Add New button; and drop it down to the New ML Experiment.
And ML is Machine Learning?
That’s correct. Now, I'll give it a name. This will take you to the staging area where you can add a training data set with this button, or you can search for existing data sets: Iris. Now it's asking you for selecting the Target you want to predict.
Right. That's just the type of Iris that we're looking at?
That's right.
Okay.
And then all of these have been profiled. It gives us some more information on distinct values.
But what does that term ’One-hot encoded’ mean?
So, it's going to actually encode these as 1, 2, 3. So, that it's easier to do the computation. AutoML is set up to handle in the background. So, the user doesn't have to do any of that work.
That's pretty cool it does it automatically for you. It's all built in. I love that.
That’s right. This is just more information about the Target, the features that we've selected, and then it's been able to look at the Target we picked, and it knows that it's a Multiclass classification problem.
Okay.
And then it's gonna give you the algorithms available that work with that particular problem.
So, this is a bunch of pre-built algorithms that AutoML has identified as the best to use with this data set?
That's right. So, you can kind of think of it - it's like your shopping cart. You want to try all of these when you're running the experiment.
And you'll get to decide which one you want to run with, right?
Correct. The tool will also evaluate and pick the champion.
Okay.
So, that's the other nice thing is it will show you what's the top dog.
Cool.
And then one more point I'll show here is Hyperparameter Optimization is a very powerful tool option that is available with AutoML. You can enable it. This is something we don't recommend doing on the first run of your experiment. As you get more confident in your model, this is where you can turn on HPO or Hyperparameter Optimization to run the model over many different iterations.
It sounds like you can set it to continue to calculate for as long as you say just analyze more and more potential data. Is that the idea?
That’s right. Yeah, it's changing the input parameters to give you more iterations of the model and more opportunities to pick a champion out of a set.
Okay.
So, to set the maximum time; these are in hours. So, you can set it to run for 5 hours, 4hours, it's going to continue to produce models during that time frame; and then once it gets cut off, then it will make a decision on the best one.
Wow, that sounds very powerful to sit there and continue to calculate and churn away more and more models over hours.
That’s right. Just for this demo purposes, we're going to keep that disabled, and now we're ready to click on Run Experiment.
So, it's currently calculating against each of those individual algorithms and the historical data which algorithm would predict the most accurately?
Correct. This chart that's generated here will show metrics like F1 Macro, F1 Micro, F1 Weighted, and the Accuracy. The Gradient Boost Classification was the top performing model. And the other chart is the Permutation Importance.
Okay.
This is a bar chart. It gives us a background here: how much does the model rely on each feature? And you can see that Pedal-length, Pedal-width or our top. So, if you shuffle around the Pedal-length on a particular record, how much does it affect the model performance?
Oh okay. So, basically those two Fields: Pedal-length and Pedal-width, those affected the outcome the most?
That's right.
Okay. And the top algorithm is the XGBoost Classification with 97% accuracy. So, where do you go from here?
So, from here in this case, we would go ahead and click on Deploy. And I'll leave the name. and click on Deploy. And here on this top, you can see that it's created; and I'll click on Open; and it will take us to Machine Learning Model Management. And the Deployment Overview, we have information about the model we just deployed. We also have data set predictions; it's going to be blank because we haven't done anything so far. And then it also has some information about some API access if you're accessing via another tool such as python or Postman.
Okay.
You can create a prediction and Select Apply Data Set. So, it can be done manually within the tool, but then there's also some options to create those predictions within a Qlik Sense app.
So, that would be like in addition to having those slider tools of manual inputting data, you could have a data set with a whole bunch of new values or current data and that would apply that instead?
Mm-hm.
Okay.
And you can do it as one big batch. You can also schedule.
That's really cool. So, you could put it on a schedule like a task that updates whenever the new data is available and refreshes?
Mm-hm. I'm just picking all of the defaults, but here's this Create Schedule, if you did want to have it all configured within the Model Management tool.
That's really cool.
So, I'll click Cancel, because we're not going to run the prediction from this.
So, how do we bring this into a Qlik Sense app?
I'm going to click on Iris Dev which is a copy of where we were on that original sheet, but I want to show you how we got there.
Okay. This is an app where you've pre-built a lot of things, but you still have a few final steps before it actually works?
The first thing, and I'll switch over to the Data Load Editor. For this app, we started with loading the Iris data set, just to have that historic data.
Right. So, you could show those two Scatter Plots with historical data on the sheet.
To be able to connect it back to AutoML, on the right-hand side, there's Data Connections, Create New Connection. We have several different types and then there's also Analytics Sources. This one here is the Qlik AutoML connection.
Okay.
It's going to prompt you for an ML Deployment. We'll pick the model that we created.
Right.
And then if we wanted to create a batch of prediction, it will ask us for a Return Table. So, we can name it Iris_Predictions. In this case, we don't have Shapley values generated; we won't include the apply or errors; and then this Association Field. If the data set doesn't have a unique identifier, AutoML can generate a value for that.
Okay. So, that is an auto-built index field you could use to help connect it to the rest of your data in the app?
Yes. That’s right. It's very handy; and –
Yeah.
I'll name this just Iris2, because I already have one called Iris; and then I'll click on Test Connection.
Okay.
It connected successfully; and then I'll click on Create. This is this new AutoML connection. If I go here to the select data icon, resident table; in this example, it's a little not-best practice. In most cases, it would be a new data set, a test data set, or like apply data set<, but in this case, we've just recycled and used the same data set that we trained with.
Sure.
When I input Iris now. it's going to pop up with Iris Predictions. And then give me prediction script that I can then Insert. And now we'll see that it's giving us this script Eval function against this connection Iris 2, with this data set Iris. If you run that, then you'll get like a whole new data set in the app.
Okay.
To differentiate, this is a slightly different variation of running that prediction.
Okay. But we did need to create that AutoML connection to the predictive model to make this work, right?
That’s right. This is a Sense KPI. We can set an expression using a script Eval string function to connect to Iris, which is that connection that I had before; and then I can predict against a particular column: variety Setosa based on these 4 variables.
Okay. This is a function that takes a look at your predictive model and the data; and it's guessing based on those 4 input variables, how likely it is to be variety Setosa?
And you can think of these as slider inputs; this is where -
Okay.
We're changing, we're making variations; it's picking up those dynamic inputs, and then using them to create predicted probability. Apply. And now you can see it pop back up (the probability value). Go ahead and do this for the other; they've just been previously commented out.
Okay. And is the big difference between doing it this way versus doing it in the script; within the script you can kind of batch apply a large data set, and here, our current data set is whatever we input into those sliders? Is that the big difference?
Correct. So, the big difference here is that we're looking at this on a 1-record basis.
1 flower at a time, 1 measurement at a time.
Yeah, 1 flower at a time, whereas the other way, we get an entire batch run of predictions.
Okay.
Here you can see this row index variety is the original historic value.
Yeah.
And then this is the predicted value. So, you can see where it missed or where it got the wrong value. If we selected by color, then we can really drill into how many it predicted a Setosa and potentially got wrong.
Yeah, that's interesting. I mean it all is just a Machine Learning prediction experiment right? So, it's cool being able to analyze and see how accurate it is. I guess that's why it wasn't 100%?
That's right. So, if yeah if it misses it's most likely to be Virginica.
Interesting.
But if we go on another type, Versicolor of the ones that it did miss, there was a little bit more variety. And lastly on Virginica, this is much more overlap with Versicolor. It predicted in thinking it was one thing, but it was actually Virginica.
I guess those two types of Irises have very similar color.
That’s right. And those 2, if you go back to the scatter.
You can see that those 2 are very closely aligned.
Yes, a lot of overlap there. Do you have any other examples on how this could be applied?
Sure. I have another example. This is another model that I was playing around with. It's like an HR analytics data set. Again, I believe I found this on Kaggle and it's a binary classification. So, it's either the person is getting promoted or they didn't get promoted.
Okay.
And then we can toggle with different inputs such as Department, how well you did on training or certifications, your previous year rating, and then any awards won.
Okay. But similarly, it's got 4 variables; kind of like the flower model, 4 variables and you get to see how those variables affect the chance of someone actually getting promoted, right?
That’s correct. And in this case, there were more inputs, but I narrowed it down on my second or third iteration of the ML experiment. I only selected the top five.
Okay.
Kind of interesting thing is (you know) based on department, does that make a difference in your probability. So, with R&D it's a lot higher than it was in Analytics; and how does that compare with something like HR.
That's interesting.
This could be for business analysts or even leadership to then trying to get a better idea of a potential model that's being used in production.
Yeah, I mean, that's the whole point, right? It's all predictive analysis; it opens up a whole lot of possibilities. It seems pretty powerful. Well Kelly, this is great. If somebody wants to know more, what are some resources that are available for diving into AutoML a little bit more?
Sure. So, this is a Community article about getting started with Qlik AutoML. It's giving you an overview. It also has some data sets; this is a multiclass, this is a binary customer churn, and then this last one is a regression problem. So, you'll get the different types. This is another article. More Sample data and also a very nice video; just another example of the predicted power within a Qlik Sense app. And our Continuous Classroom Introduction to Qlik AutoML that we're getting people started and up to speed on working with AutoML.
Great. There was already a lot of resources out there: educational, some sample data, looked like a lot of fun, and this course for learning more. Alright well, now it's time for Q&A. Please submit your questions to the Q&A panel on the left side of your On24 console.
Okay. First question: is it possible to update the data set used in an AutoML experiment? And if so, how?
No, at this time you cannot update the data set used in an AutoML experiment. Another note is that once you do select the Target, you're not able to change that as well. This may change in the future, but right now if you want to refresh your model, you have to make a new experiment and you know go through the process of selecting the Target, and then you can iterate through but you'll be using the same data set.
Okay. And that's that historical data you're basing the experiment on, right?
That's correct.
Okay.
On the other hand, if you want to run a deployed model against new record, the prediction or apply data set can be refreshed; and that's we noted on that a little bit with the like the schedule. Let's say your apply or prediction data set is getting refreshed at a particular time, then you can run the model against it and then have new prediction outputs.
Okay. I mean that makes sense. Next question: is AutoML free?
AutoML is included with any Qlik Sense Enterprise SAS license and it comes under the included tier.
Okay.
And the included tier has a limit, but if you would like to increase and go to a higher tier, that's something you know our Sales team can certainly get you in the right place depending on on your data set needs.
Okay. That's good to know that. It's pretty cool that it's included with SaaS, but for people doing a lot of it, that might need some expansion. All right, moving on. What's the limit to the size of data an AutoML experiment can use?
So, within the included tier, which is what we are referencing in the previous question kind of, that's included with Qlik Sense Enterprise SaaS is 100,000 cells. So, for example, that could be a table with 10,000 records, and 10 columns. That's currently the limit where we want folks to train and experiment with models, but it will be capped at that particular value of 100,000 cells. And as you scale up our Premiere tier, that's when the doors really open up on those limits that are currently set.
All right, next question: can I choose my own index column? We saw that it pre-generated one for you that was available, but can they choose their own?
Yes, if you have an ID field or an index, unique identifier column; this can be chosen when you're doing your prediction. Whether that be in the the AutoML connection or either in the UI where we saw that ML Model Management interface. I will say, we don't recommend using ID Fields when you have your training data set, because that's a value that for each particular record. It's not going to add anything into your model. Save this field for when you're doing the prediction and apply data set.
Makes sense. Next question: is there a Windows version of AutoML or is this only for use with Qlik Cloud?
AutoML is only available with Qlik Cloud. So, there's not a desktop or Windows version available.
Okay. That's simple enough. What types of data sources can be used for AutoML?
AutoML currently supports record-wise data that's in a tabular format; and we support you know CSV files, Excel files, QVDs; it must be able to be staged in Qlik Catalog. So, if you have an issue uploading, it's not able to surface on Qlik Catalog, then you're not going to be able to use it on Qlik AutoML.
That makes it simple enough; if it's in Catalog it can be used. Okay. Does AutoML work with Snowflake? I guess a follow-up to that previous question.
So, there's not a direct connection with Snowflake within AutoML, but yes if there's a data set that you've been able to connect to within Qlik Catalog; if you're able to connect to a Snowflake source to have the data within Qlik Catalog and it's in a tabular format, then you could be able to use this in AutoML. But within the AutoML interface, there's not right now a button or a tool that's like ’Oh, connect to Snowflake.’ It's something that has to be previously set up within Qlik Cloud.
Yeah, there's some different ways to connect to Snowflake, but that's a separate topic. Next question: (it's very specific) I'm getting a failed precondition error; what are the requirements again or any suggestions on how to troubleshoot this type of error?
If you do run across an error that you don't know how to get around; the best course of action is to either reach out on our AutoML Community page or open a support case. And we ask that you'd be prepared to share your tenant information such as your tenant ID, and hostname, subscription ID, and if you are experiencing an error code; copy paste that particular error or take a screenshot. So, that we're able to continue to troubleshoot that issue.
That's great advice. The next question: is there a way to create a task to run the experiment on a schedule? I think you demoed that. I guess, could you review that one more time?
Sure. So, the ML experiments cannot be scheduled, but the predictions can be scheduled within ML Model Management that you know user interface right that's the distinction there.
Okay. Last question: where can we find documentation or more instruction on how to set this up ourselves?
Sure. So, we have further documentation or instructions. We'll give you links towards Hands-On tutorials, Introduction to AutoML, and also a Community article about how to get started with Qlik AutoML, and a direct link to our documentation which is currently associated with The Qlik Cloud documentation.
Perfect. Yeah, and we'll include all those links you've already shown and the ones you're mentioning there. Well Kelly, thank you very much. I think this will help a lot of people get introduced to AutoML and start using it, and it definitely seems like a really powerful tool.
Yes, thank you so much Troy again for having me present. It is, it's a really exciting to have this as a part of the Qlik ecosystem, and we hope that customers get excited and be able to use this the capabilities with AutoML and their day-to-day activities.
Okay great. Thank you everyone! We hope you enjoyed this session; and thank you to Kelly for presenting. We always appreciate getting experts like Kelly to share with us. here is our legal disclaimer. And thank you once again. Have a great rest of your day.

keithyowell1 · ‎2024-05-15

I'm having a pretty basic issue that really makes me question the price of the license my job is paying for. I use zip code as a feature which Qlik interprets as numeric. This is not correct so before I try to train the model i tell Qlik that the field is categorical. in another case qlik interprets percent of purchases made in-person as a categorical field. this is also not correct so I respecify it into a numeric field. when I ask to run the experiment it tells me"AutoML could not convert PCT_IN_PERSON to numeric. Please start the experiment over or create a new experiment from the hub." How do I get around this?

Sonja_Bauernfeind · ‎2024-05-17

Hello @keithyowell1

Please post your question directly in our Qlik AutoML forum, including as much detail as possible on what you are looking to achieve. There, your active Qlik peers and our support agents are better equipped to help you.

All the best,
Sonja

STT - Using AutoML in Qlik Sense