Random numbers

In this lecture, you will implement random numbers in Python and learn how to make computations with them. First we characterize the statistical properties of random numbers by calculating the mean and standard deviation. Second, we perform numerical integration with random numbers.

Despite the fact that random numbers are not well defined, they vary between realizations, you can make calculations with them just as 'exact' as with normal numbers.

Learning objectives: After finishing this lecture, you should be able to:

  1. Make random numbers and know how to analyze them
  2. Implement Monte Carlo integration
In [ ]:
# Initialisation code for the notebook
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('resource/asnlib/public/')
sys.path.append('../answer_checking/') 
from validate_answers import *

Random numbers

A computer can be of good aid to calculate a variety of random numbers for you. You can generate them as single numbers, as arrays, and with different distributions. Here we will generate random numbers for a few of the most common distributions. Numpy provides a variety of random numbers with different distributions.

The uniform distribution

Random numbers generated with the uniform distribution have an equal probability to fall within an interval defined by the bounds $[a, b]$. The probability density function for the uniform distribution is

\begin{equation} p(x)=\left\{ \begin{array}{ll}\frac{1}{b-a} & \, \, \text{for} \, a \leq x \leq b\\0 & \, \, \text{else} \end{array}\right. \, . \end{equation}

The uniform distribution has a mean \begin{equation} \mu = \frac{1}{2}(a+b) \end{equation} and a variance \begin{equation} \sigma^2 = \frac{1}{12}(b-a)^2 \end{equation}

Exercise 1

Create a uniformly distributed random variable between bounds 5 and 8 and draw $N$=100 samples from it, as an array.

  • Determine the sample mean of the generated samples
  • Determine the variance.
  • Check that the mean and variance numbers approximately match the theoretical values as shown above, increase $N$ to see convergence to the theoretical values.

Note that you can generate the uniformly distributed random numbers in two ways: from the uniform function and from the standard rand function (with uniform distribution between 0 and 1). The latter has to be shifted and scaled to match the interval $a, b$. Try them both!

In [ ]:
###
### Your code here
###

Gaussian random numbers

The Gaussian distribution, or normal distribution, is a distribution that occurs a lot in physics. This is due to the fact that many physical observables are the result of a sum of a large number of random steps. In the limit of infinitely many of these steps, the distribution of this sum converges to the Gaussian distribution. The Gaussian probability density function is

\begin{equation} p(x)=\frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(x-\mu)^2/2\sigma^2} \end{equation}

with $\mu$ the mean and $\sigma^2$ the variance.

Exercise 2

Create a Gaussian distributed random variable with mean 10 and standard deviation of 7 and take $N$=100 samples from it, as an array.

  • Determine the sample mean of the generated samples
  • Determine the variance.
  • Check that the mean and variance numbers match the theoretical values , increase $N$ to see convergence to the theoretical values.

Note that you can generate the Gaussian distributed random numbers with mean $\mu$ and variance $\sigma^2$ in two ways: from the normal function and from the standard normal distributionrandn function (with zero mean and variance 1). The latter has to be shifted and scaled to match the arbitrary Gaussian. Try them both!

In [ ]:
###
### Your code here
###

A simple example of the use of random numbers is in the calculation of $\pi$. The method is quite simple. You generate a lot of points in a known area, in this case a square, that encloses the object for which you want to know the area, in this case the unit circle. Next, you count the number of points inside the object and divide that by the total number of points you generated. The ratio is an estimate of the fraction of area that the object covers. Finally, you multiply the ratio by the area of the known area around the object and you have an estimate for the to be integrated area. Since the area of a circle is $\pi r^2$, the estimated area is $\pi$.

Exercise 3

Generate $N=100$ uniformly distributed random numbers between -1 and 1 for the $x$-coordinates. Do the same for the $y$-coordinates.

  • Plot the random numbers in the $x$-$y$ plane, add a circle according to the given code
  • Make a for loop and iterate over all $N$ points
  • Check for every point whether it is within the unit circle, if so add 1 to a variable that counts the number of points in the unit circle
  • Calculate $\pi$ by dividing the number of points inside the circle by $N$ and multiply by the area of the square surrounding the unit circle
In [ ]:
# Plot circle

angle = np.linspace(0, 2*np.pi, 100)
plt.plot(np.cos(angle), np.sin(angle), '--r')

###
### Your code here
###

How random are random numbers?

True random numbers, like the ones you use in statistics, are challenging to obtain. For example, something like a perfect unbiased dice does not exist. Also on a computer generated random numbers are not purely random since they are created by deterministic algorithms. Hence, a computer generates pseudorandom numbers that for the majority of problems are sufficiently random.

A common method to generate random numbers is through the use of the equation

\begin{equation} x_{n+1}=(a x_n + b) \, \text{mod }m \end{equation}

where $a$, $b$, and $m$ are integer parameters and $x$ is an integer random number. Starting with an initial number $x_n$ the equation makes new numbers that are seemingly random (for the proper choice of $a$, $b$ and $m$). Obviously, for fixed $x_n$ and $a$, $b$, and $m$ the generated numbers are random but always the same sequence. The number $x_n$ is called the seed, and when calling the random number generator with a fixed seed the random numbers are always the same, if not then the random number generator generates a new (sequence of) random numbers for every time you call the function.

Seeding can be performed by calling the function seed() from the numpy random library with in between the () an fixed number variable. Seeding can be handy for debugging and to compare your code to the code of somebody else.

Exercise 4

Use a fixed seed for exercise 2 and 3 check that you get the same numbers every time you run your code.

Monte Carlo integration

Monte Carlo integration not only can be used to estimate surface areas or volumes, it can be used to integrate any function in any multi-dimensional space. Note that in principle the dimension of the integrated volume can be greater than or equal to 1.

Assume that we want to integrate over a (hyper-) volume $S$. The way this works is through the mean value method by

\begin{equation}\label{eq:mcint} I=\frac{V}{N}\sum_{i=1}^{N} f({\vec{r}_i}) \chi({\vec{r}_i}) \, . \end{equation}

Now, have a good look at this formula.

  • $V$ represents a (hyper-) volume that encloses the to be integrated volume $S$, $V$ is usually a rectangle or (hyper-) cube.
  • $N$ is the number of random points in $V$
  • $f$ the function to be integrated. If you just want to calculate an surface area or (hyper-) volume $f$=1
  • The function $\chi$ is an index function that is 1 if $\vec{r}_i$ is inside $S$ and 0 if $\vec{r}_i$ is outside of $S$.

So if $f$=1, the (hyper-) volume is just the sum of the number of points in $S$ divided by the total number of points in $V$. For example, for the case of calculating $pi$, the 'volume' $V$ is the square area surrounding the circle $S$. The value of $pi$ is then calculated by summing up all numbers in the circle and dividing them by the area of the square. If we want to integrate the function $f$, we calculate $f(\vec{r}_i)$ for all random points. Multiplication with $\chi$ makes sure that we do not integrate the function outside $f$ of $S$.

So in general, we have to calculate $f(\vec{r}_i)$ and $\chi({\vec{r}_i})$ as fast as possible. Next we multiply them and sum them. A quick way to do this is is by using array and Boolean indexing, a nice feature of many programming languages. Try it first for yourself.

Boolean or “mask” index arrays

Integer-type arrays can be used for advanced indexing to access the contents of an array. Consider the simple example below

In [ ]:
y = np.array([ 0, 2, 4, 8, 16, 32])
index = np.array([2, 3, 4])
print(y[index])

Now if we would like to check for an array whether a condition is met for all elements, it would be nice to make an array with the indices where this statement is true. You can do this by using Boolean variables.

Booleans are variables that have two values true or false. You can use Python to check whether a statement is true or false and Python then generates a boolean variable. For example, for a variable a

In [ ]:
a=2>3
print(a)

results in a being of type bool and having a value of False. Boolean indexing works in the following way: if x is an and y is a boolean-value array of the same shape as x. Then x[y] returns an array that is formed by from x and contains the values where y is True. Try it out for yourself.

Generate an array x of 100 random numbers between 0 and 10 and count (without using a for loop) how many numbers are larger than 5.

Method 1:

  • Make a Boolean variable y for the condition x>5
  • Print the variable y, the values are Boolean, i.e.,True or False
  • Extract from array x the elements that match the condition with x[y]
  • Calculate the length of the extracted array
In [ ]:
###
### Your code here
###

Just as you can extract a series of numbers from an array with advanced indexing Python recognizes that if you use a Boolean array you to convert it to an array with values equal to the locations that are True.

Method 2

  • Make a Boolean variable y for the condition x>5
  • Take the sum of y
In [ ]:
### BEGIN SOLUTION
N=20
x=10*np.random.rand(N)

# Method 2
y=(x>5)
z=sum(y)
print(z)

### END SOLUTION

Obviously, method 2 is much faster to implement. But wait a minute, a sum over an array of Booleans creates an integer number, something weird must be going on here?!

In fact, Python is sufficiently clever to understand that if you take a sum that True counts as 1 and False as zero. Another way of seeing is this is that 1*y, with y a Boolean array, generates a sequence of ones and zeros that can be summed. So when performing mathematical operations with Booleans these may be converted to numeric values.

Multi-dimensional integration

Using the method of Boolean indexing, the speed of calculating Monte-Carlo integrals through the mean value method can be improved. When the mean value method is used to calculate areas or volumes $f(\vec{r}_i)=1$. In the most general case $f(\vec{r}_i)$ is the function that is integrated over an area or volume.

Exercise 5

Change your code for calculating $\pi$ using Boolean indexing, i.e. replace the for loop with two lines of code: one for Boolean indexing the points within the circle, the second for calculating the sum.

The beauty of Monte-Carlo integration is that extension to higher dimensionality integrals is straightforward. Just generate the random numbers in the multi-dimensional space enclosing the to be integrated volume $S$, apply Boolean indexing, and calculate the mean value.

Exercise 6

In this exercise we are going to calculate the mass of a 3D sphere with linearly increasing density. This problem is quite challenging to do analytically, but with Monte Carlo integration it is relatively easy.

Let's first start with the simple problem of determining the mass of a sphere with uniform density $\rho_0=1.5$ kg/m$^3$. The sphere has a radius of 1 meter, and we initially integrate it with $N=100$ points.

  • Generate an array of $N$ random numbers for the $x$, $y$, and $z$ coordinates. Make sure that all points lie in a cube enclosing the sphere
  • Generate an index function. This is an array that is equal to 1 for points inside the sphere and 0 outside the sphere. Use the xx technique you learned above to do this without a for loop.
  • Calculate the mass by applying the mean value method
  • Do you get a mass of 2$\pi$ kg? And if you increase $N$?
In [ ]:
###
### Your code here
###

Exercise 7

Now we make it a bit more difficult by adjusting the code. Consider a density that is a linear function of the radius according to $\rho(r)=\rho_0 r$.

Calculate the mass of the sphere for the case of the linear increasing density using the methodology developed in exercise 6. With some effort we can calculate the analytical answer as 1.5$\pi$ to check our answer.

In [ ]:
###
### Your code here
###

Monte-Carlo integration accuracy

The Monte Carlo integration error scales as $N^{-1/2}$ no matter what the dimensionality of the problem is. If we would have performed the integration with the simple sum rule the error would be $O(h)\propto O(1/N)$ , which on first look may indicate that this is more accurate than Monte Carlo integration. However, this accuracy is only for a single dimension.

For a cubic volume we have only $N^{1/3}$ points along every direction. Hence, the simple sum integration error in three dimensions scales as $N^{-1/3}$, which is worse than with Monte Carlo integration. Especially for problems with high dimensionality (D) the use of Monte Carlo integration is more advantageous than conventional numerical integration as, for simple sum integration, the accuracy scales as $N^{-1/D}$, whereas Monte-Carlo integration always scales as $N^{-1/2}$.

These integrations occur regularly in statistical mechanics.