Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Generating random numbers with Normal (Gaussian) distribution

Hey Everyone,

I’m not sure about you, but sometimes I need to generate random data, and occasionally I need random data that actually looks like something. In my foray into attempting to create random data with a normal distribution for some sample dashboards I was building I discovered the Box-Muller transformation.

Box-Muller uses some clever trigonometry, and properties of the unit circle to transform two random numbers between 0 and 1 (such as those generated by the RAND() function) into one (or two) random numbers that conform to a normal distribution with a mean value of 0, and a standard deviation of 1.

Details on the math of the transformation can be found here: http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

There are two main forms of the Box-Muller transformation; I’ve used the basic form with the trig functions to good effect.

The function, in QlikView syntax looks like this:

sqrt(-2*log(rand())) * cos(2* pi() * rand())

For my purposes, I just discarded the other half of the transformation using sin(), since they both accomplish the same thing.

Now with the handy AUTOGENERATE function in QlikView, I’ve generated a bunch of normal data, but how often are you going to need numbers with a normal distribution with a mean of 0 and a standard deviation of 1? Probably not very often.

The good news is, the box-muller transformation function can be shifted, just like function transformations you learned in high school math. All you need to do is multiply the number generated by the desired standard deviation, and add the desired mean.

So:

set fxGaussianRandom = sqrt(-2*log(rand())) * cos(2* pi() * rand());

LOAD

($(fxGaussianRandom) * 15) + 35 AS AGE

AUTOGENERATE(1000);

Creates 1000 records with a normal distribution whose average is 35, and standard deviation is 15.

Super handy, and simpler than I thought.

Happy Randoming!

2 Replies
danielrozental
Master II
Master II

Thanks Fry very interesting.

You could also use QlikView built in function norminv to get random values with a normal distribution.

Something like norminv( rand(), mean, dev).

evanplancaster
Contributor III
Contributor III

Great tips, both of you! One suggestion is to remember to add boundaries for things where negatives or overly large boundaries may become nonsensical, like for age (negative ages wouldn't make sense). If you don't exactly care how many records you end up with, then you can just do a preceding load that includes a WHERE clause to exclude values below the boundary, like so:

LOAD *
WHERE Age >= 0;
LOAD 
  norminv(rand(), 35, 15) as Age
Autogenerate (10000);

Of course, this will reduce the number of records you have (because some will be excluded thanks to the WHERE clause), so if you want to make sure you get back exactly the number of records you want (in this case, 10000), you'll have to increase the value of the argument of the autogenerate function to compensate for that (how much depends on how close your boundary is to the mean and the size of your standard deviation).

Thanks again! This was just what I was looking for.