3 Replies Latest reply: Nov 14, 2017 3:18 AM by Simone Trabattoni

# Qlik AAI: Clustering in R, kmeans

Hi community,

I'm playing with R and Qlik Sense Desktop, using this very excellent example, the kmeans app given by deh, . Here the original app and the data, you can find a modified app attached also to my post.

https://community.qlik.com/docs/DOC-18787#comment-63411

The app uses the famous iris data, to perform in Qlik Sense a kmeans clustering of the observations (each observation is have four quantitative continue variables occurrencies, and a qualitative variable occurency, the species of the iris flowers): the result is a nice scatterplot that have on the axis a pair of the quantitative variables, as dots the observations, and as colour the cluster.

My goal is to have on the dots, the species of the iris (3 species), clearly with a reasonable number of cluster (1). As you can understand my goal is not to have an analysis with a meaning, but test the sistem to see how much is flexible.

So I simply put the species in the dimension panel in Qlik and the result is an error, something like "the client has not a valid argument".

Looking the SSEtoRServe:

I decided to transplant the problem in R.

Working on data in R without using the observation as measure (like in Qlik Sense), the result is the same, here my code:

Iris <- read.csv('yourpath\\Iris.csv',sep=',')

nrow(Iris)

ncol(Iris)

head(Iris)

clusters <- kmeans((Iris[,2:5]), 3, nstart = 20)\$cluster

head(Iris[,2:5])

Iris2 <- cbind(Iris,clusters)

library(ggplot2)

p <- ggplot(Iris2, aes(petal.length, petal.width))

p + geom_point(aes(colour = factor(clusters)), size = 3) + geom_text(aes(label=observation),size = 3)

And that's great!

So I tried do create the error that I have, using the species as dimension (I've clean up all the memory etc. in RStudio):

data <- read.table("yourpath\\Iris.csv",sep=",",header=TRUE)

head(data)

data <- data[,2:6]

head(data)

library(plyr)

#  grouping

data <- ddply(data,

~iris.species,

summarise,

sepal.length = mean(sepal.length),

sepal.width = mean(sepal.width),

petal.length = mean(petal.length),

petal.width = mean(petal.width)

)

# it does not work

kmeans(data,1)

# it works!

kmeans(data[,2:5],1)

So R use also the iris species in the kmeans, if you  do not remove it explicitly.

The question is: how can I make it work with the species as dimension, i.e. a non numeric dimension? I cannot understand why it does not work.

Thanks in advance.

Added the app modded (I modified also de ER, nothing of exagerate).

EDIT: Added the app with the aggr()

• ###### Re: Qlik AAI: Clustering in R, kmeans

I've done a step further.

Using

AGGR(R.ScriptEval(

'kmeans(cbind(q\$petLen, q\$petWid, q\$sepLen, q\$sepWid), 1, nstart = 20)\$cluster',

[petal length] as petLen,

[petal width] as petWid,

[sepal length] as sepLen,

[sepal width] as sepWid

),[iris species])

as measure, you can visualize the points but they are not colored, because the error is always the same. Hope it hepls to solve the mystery.

• ###### Re: Qlik AAI: Clustering in R, kmeans

Hi Simone,

The labeling of the bubbles in your scatter chart is something you control in the R expression. This is an option in the chart-type itself. Look under presentation-> labels

Regard,

Bas

• ###### Re: Qlik AAI: Clustering in R, kmeans

Hi Bas,

sorry, but I've tried to look at

presentation -> labels

but I've not found anything that makes the chart work. Maybe I'm wrong, could you please explain me how can I make it works?

I've added the app with the aggr() mod, hope it helps (if you do not see it, it's because it's under moderation, however in the last chart I've put the formula in the previous comment in place of the original one).

Thanks