Qlik Community

Qlik Server Side Extensions Discussions

Discussion Board for collaboration on Server-Side Extensions and Advanced Analytics Integration.

simotrab
Contributor II

Qlik AAI: Clustering in R, kmeans

Hi community,

I'm playing with R and Qlik Sense Desktop, using this very excellent example, the kmeans app given by deh, . Here the original app and the data, you can find a modified app attached also to my post.

https://community.qlik.com/docs/DOC-18787#comment-63411

The app uses the famous iris data, to perform in Qlik Sense a kmeans clustering of the observations (each observation is have four quantitative continue variables occurrencies, and a qualitative variable occurency, the species of the iris flowers): the result is a nice scatterplot that have on the axis a pair of the quantitative variables, as dots the observations, and as colour the cluster.

My goal is to have on the dots, the species of the iris (3 species), clearly with a reasonable number of cluster (1). As you can understand my goal is not to have an analysis with a meaning, but test the sistem to see how much is flexible.

So I simply put the species in the dimension panel in Qlik and the result is an error, something like "the client has not a valid argument".

Looking the SSEtoRServe:

clust.PNG

I decided to transplant the problem in R.

Working on data in R without using the observation as measure (like in Qlik Sense), the result is the same, here my code:


Iris <- read.csv('yourpath\\Iris.csv',sep=',')

nrow(Iris)

ncol(Iris)

head(Iris)

clusters <- kmeans((Iris[,2:5]), 3, nstart = 20)$cluster


head(Iris[,2:5])

Iris2 <- cbind(Iris,clusters)

library(ggplot2)

p <- ggplot(Iris2, aes(petal.length, petal.width))

p + geom_point(aes(colour = factor(clusters)), size = 3) + geom_text(aes(label=observation),size = 3)

clusteR.PNG

And that's great!

So I tried do create the error that I have, using the species as dimension (I've clean up all the memory etc. in RStudio):

data <- read.table("yourpath\\Iris.csv",sep=",",header=TRUE)

head(data)

data <- data[,2:6]

head(data)

library(plyr)

#  grouping

data <- ddply(data,

        ~iris.species,

        summarise,

        sepal.length = mean(sepal.length),

        sepal.width = mean(sepal.width),

        petal.length = mean(petal.length),

        petal.width = mean(petal.width)

            )

# it does not work

kmeans(data,1)

# it works!

kmeans(data[,2:5],1)

So R use also the iris species in the kmeans, if you  do not remove it explicitly.

The question is: how can I make it work with the species as dimension, i.e. a non numeric dimension? I cannot understand why it does not work.

Thanks in advance.

Added the app modded (I modified also de ER, nothing of exagerate).

EDIT: Added the app with the aggr()

Tags (4)
Labels (1)
3 Replies
simotrab
Contributor II

Re: Qlik AAI: Clustering in R, kmeans

I've done a step further.

Using

AGGR(R.ScriptEval(

'kmeans(cbind(q$petLen, q$petWid, q$sepLen, q$sepWid), 1, nstart = 20)$cluster',

[petal length] as petLen,

[petal width] as petWid,

[sepal length] as sepLen,

[sepal width] as sepWid

),[iris species])

as measure, you can visualize the points but they are not colored, because the error is always the same. Hope it hepls to solve the mystery.

bvk
Contributor II

Re: Qlik AAI: Clustering in R, kmeans

Hi Simone,

The labeling of the bubbles in your scatter chart is something you control in the R expression. This is an option in the chart-type itself. Look under presentation-> labels

Regard,

Bas

simotrab
Contributor II

Re: Qlik AAI: Clustering in R, kmeans

Hi Bas,

sorry, but I've tried to look at

presentation -> labels

but I've not found anything that makes the chart work. Maybe I'm wrong, could you please explain me how can I make it works?

I've added the app with the aggr() mod, hope it helps (if you do not see it, it's because it's under moderation, however in the last chart I've put the formula in the previous comment in place of the original one).

Thanksiris_labels.PNG

Community Browser