Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content
Announcements
See why Qlik was named a Leader in the 2025 Gartner® Magic Quadrant™ for Augmented Data Quality Solutions: GET THE REPORT
cancel
Showing results for 
Search instead for 
Did you mean: 
johnnyjohn
Creator
Creator

Apply ZScore Normalisation to Clustering (KMeansND)

Hello, 

I'm trying to solve a relatively difficult problem that I've hit a wall on. It's a bit of tricky one and would appreciate any help. 

I'm looking to apply ZScore normalisation using the native KMeansND clustering function. 

KMeans2D has the ability to pass this as an argument, KMeansND does not. johnnyjohn_0-1734505842830.png

Let me try to break this down as much as I can. 

This is KMeans2D without normalisation. This is achieved by colouring the scatter by dimension and using this expression: 

 

=Aggr(KMeans2D(5, n2_sprd_historical_bps, n2_value_EUR), n2_orderref)

 

johnnyjohn_1-1734506169686.png

This is KMeans2D with normalisation. You can see significant differences in how datapoints are clustered (more accurate for what I need). The expression for this is: 

 

=Aggr(KMeans2D(5, n2_sprd_historical_bps, n2_value_EUR, 'zscore'), n2_orderref)

 

johnnyjohn_2-1734506183197.png

So as a starting point I'm trying to replace with KMeansND. I can achieve a similar result as KMeans2D without normalisation by using the expression below. The output is the exact same as no normalisation KMeans2D (as expected):

 

=Aggr(KMeansND(5, n2_sprd_historical_bps, n2_value_EUR), n2_orderref)

 

johnnyjohn_4-1734507271624.png

I can successfully generate ZScore calculation on a table by using this expression as a measure

 

=(n2_value_EUR - avg(total n2_value_EUR)) / stdev(total n2_value_EUR)

 

johnnyjohn_3-1734506627672.png

The idea is to plug this into KMeansND, as below. 

 

=Aggr(KMeansND(5, (n2_sprd_historical_bps - avg(total n2_sprd_historical_bps)) / stdev(total n2_sprd_historical_bps), (n2_value_EUR - avg(total n2_value_EUR)) / stdev(total n2_value_EUR)), n2_orderref)
            

 

The expression throws no errors and scatter plot generates fine, but the output looks the same as the non-normalised version, showing the zscore measure hasn't been applied. 

I suspect this may have something to do with the AGGR and TOTAL function in the KMeansND, but I'm unsure. 

I've tried nesting the Aggr elsewhere but no luck

 

=KMeansND(5, (Aggr(n2_sprd_historical_bps, n2_orderref) - avg(total n2_sprd_historical_bps)) / stdev(total n2_sprd_historical_bps), (Aggr(n2_value_EUR, n2_orderref) - avg(total n2_value_EUR)) / stdev(total n2_value_EUR))
=KMeansND(5, (n2_sprd_historical_bps - Aggr(avg(total n2_sprd_historical_bps), n2_orderref)) / Aggr(stdev(total n2_sprd_historical_bps), n2_orderref), (n2_value_EUR - Aggr(avg(total n2_value_EUR), n2_orderref)) / Aggr(stdev(total n2_value_EUR), n2_orderref))

 

Any help or ideas would be much appreciated. Thanks. 

Labels (3)
0 Replies