In this lecture, you will implement random numbers in Python and learn how to make computations with them. First we characterize the statistical properties of random numbers by calculating the mean and standard deviation. Second, we perform numerical integration with random numbers.
Despite the fact that random numbers are not well defined, they vary between realizations, you can make calculations with them just as 'exact' as with normal numbers.
Learning objectives: After finishing this lecture, you should be able to:
# Initialisation code for the notebook
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('resource/asnlib/public/')
sys.path.append('../answer_checking/')
from validate_answers import *
A computer can be of good aid to calculate a variety of random numbers for you. You can generate them as single numbers, as arrays, and with different distributions. Here we will generate random numbers for a few of the most common distributions. Numpy provides a variety of random numbers with different distributions.
Random numbers generated with the uniform distribution have an equal probability to fall within an interval defined by the bounds $[a, b]$. The probability density function for the uniform distribution is
\begin{equation} p(x)=\left\{ \begin{array}{ll}\frac{1}{b-a} & \, \, \text{for} \, a \leq x \leq b\\0 & \, \, \text{else} \end{array}\right. \, . \end{equation}The uniform distribution has a mean \begin{equation} \mu = \frac{1}{2}(a+b) \end{equation} and a variance \begin{equation} \sigma^2 = \frac{1}{12}(b-a)^2 \end{equation}
Exercise 1
Create a uniformly distributed random variable between bounds 5 and 8 and draw $N$=100 samples from it, as an array.
Note that you can generate the uniformly distributed random numbers in two ways: from the uniform
function and from the standard rand
function (with uniform distribution between 0 and 1). The latter has to be shifted and scaled to match the interval $a, b$. Try them both!
###
### Your code here
###
The Gaussian distribution, or normal distribution, is a distribution that occurs a lot in physics. This is due to the fact that many physical observables are the result of a sum of a large number of random steps. In the limit of infinitely many of these steps, the distribution of this sum converges to the Gaussian distribution. The Gaussian probability density function is
\begin{equation} p(x)=\frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-(x-\mu)^2/2\sigma^2} \end{equation}with $\mu$ the mean and $\sigma^2$ the variance.
Exercise 2
Create a Gaussian distributed random variable with mean 10 and standard deviation of 7 and take $N$=100 samples from it, as an array.
Note that you can generate the Gaussian distributed random numbers with mean $\mu$ and variance $\sigma^2$ in two ways: from the normal
function and from the standard normal distributionrandn
function (with zero mean and variance 1). The latter has to be shifted and scaled to match the arbitrary Gaussian. Try them both!
###
### Your code here
###
A simple example of the use of random numbers is in the calculation of $\pi$. The method is quite simple. You generate a lot of points in a known area, in this case a square, that encloses the object for which you want to know the area, in this case the unit circle. Next, you count the number of points inside the object and divide that by the total number of points you generated. The ratio is an estimate of the fraction of area that the object covers. Finally, you multiply the ratio by the area of the known area around the object and you have an estimate for the to be integrated area. Since the area of a circle is $\pi r^2$, the estimated area is $\pi$.
Exercise 3
Generate $N=100$ uniformly distributed random numbers between -1 and 1 for the $x$-coordinates. Do the same for the $y$-coordinates.
# Plot circle
angle = np.linspace(0, 2*np.pi, 100)
plt.plot(np.cos(angle), np.sin(angle), '--r')
###
### Your code here
###
True random numbers, like the ones you use in statistics, are challenging to obtain. For example, something like a perfect unbiased dice does not exist. Also on a computer generated random numbers are not purely random since they are created by deterministic algorithms. Hence, a computer generates pseudorandom numbers that for the majority of problems are sufficiently random.
A common method to generate random numbers is through the use of the equation
\begin{equation} x_{n+1}=(a x_n + b) \, \text{mod }m \end{equation}where $a$, $b$, and $m$ are integer parameters and $x$ is an integer random number. Starting with an initial number $x_n$ the equation makes new numbers that are seemingly random (for the proper choice of $a$, $b$ and $m$). Obviously, for fixed $x_n$ and $a$, $b$, and $m$ the generated numbers are random but always the same sequence. The number $x_n$ is called the seed, and when calling the random number generator with a fixed seed the random numbers are always the same, if not then the random number generator generates a new (sequence of) random numbers for every time you call the function.
Seeding can be performed by calling the function seed()
from the numpy random library with in between the ()
an fixed number variable. Seeding can be handy for debugging and to compare your code to the code of somebody else.
Exercise 4
Use a fixed seed for exercise 2 and 3 check that you get the same numbers every time you run your code.
Monte Carlo integration not only can be used to estimate surface areas or volumes, it can be used to integrate any function in any multi-dimensional space. Note that in principle the dimension of the integrated volume can be greater than or equal to 1.
Assume that we want to integrate over a (hyper-) volume $S$. The way this works is through the mean value method by
\begin{equation}\label{eq:mcint} I=\frac{V}{N}\sum_{i=1}^{N} f({\vec{r}_i}) \chi({\vec{r}_i}) \, . \end{equation}Now, have a good look at this formula.
So if $f$=1, the (hyper-) volume is just the sum of the number of points in $S$ divided by the total number of points in $V$. For example, for the case of calculating $pi$, the 'volume' $V$ is the square area surrounding the circle $S$. The value of $pi$ is then calculated by summing up all numbers in the circle and dividing them by the area of the square. If we want to integrate the function $f$, we calculate $f(\vec{r}_i)$ for all random points. Multiplication with $\chi$ makes sure that we do not integrate the function outside $f$ of $S$.
So in general, we have to calculate $f(\vec{r}_i)$ and $\chi({\vec{r}_i})$ as fast as possible. Next we multiply them and sum them. A quick way to do this is is by using array and Boolean indexing, a nice feature of many programming languages. Try it first for yourself.
Integer-type arrays can be used for advanced indexing to access the contents of an array. Consider the simple example below
y = np.array([ 0, 2, 4, 8, 16, 32])
index = np.array([2, 3, 4])
print(y[index])
Now if we would like to check for an array whether a condition is met for all elements, it would be nice to make an array with the indices where this statement is true. You can do this by using Boolean variables.
Booleans are variables that have two values true or false. You can use Python to check whether a statement is true or false and Python then generates a boolean variable. For example, for a variable a
a=2>3
print(a)
results in a
being of type bool
and having a value of False
. Boolean indexing works in the following way: if x
is an and y
is a boolean-value array of the same shape as x
. Then x[y]
returns an array that is formed by from x
and contains the values where y
is True
. Try it out for yourself.
Generate an array x
of 100 random numbers between 0 and 10 and count (without using a for loop) how many numbers are larger than 5.
Method 1:
y
for the condition x>5
y
, the values are Boolean, i.e.,True
or False
x
the elements that match the condition with x[y]
###
### Your code here
###
Just as you can extract a series of numbers from an array with advanced indexing Python recognizes that if you use a Boolean array you to convert it to an array with values equal to the locations that are True.
Method 2
y
for the condition x>5
y
### BEGIN SOLUTION
N=20
x=10*np.random.rand(N)
# Method 2
y=(x>5)
z=sum(y)
print(z)
### END SOLUTION
Obviously, method 2 is much faster to implement. But wait a minute, a sum over an array of Booleans creates an integer number, something weird must be going on here?!
In fact, Python is sufficiently clever to understand that if you take a sum that True
counts as 1 and False
as zero. Another way of seeing is this is that 1*y
, with y
a Boolean array, generates a sequence of ones and zeros that can be summed. So when performing mathematical operations with Booleans these may be converted to numeric values.
Using the method of Boolean indexing, the speed of calculating Monte-Carlo integrals through the mean value method can be improved. When the mean value method is used to calculate areas or volumes $f(\vec{r}_i)=1$. In the most general case $f(\vec{r}_i)$ is the function that is integrated over an area or volume.
Exercise 5
Change your code for calculating $\pi$ using Boolean indexing, i.e. replace the for
loop with two lines of code: one for Boolean indexing the points within the circle, the second for calculating the sum.
The beauty of Monte-Carlo integration is that extension to higher dimensionality integrals is straightforward. Just generate the random numbers in the multi-dimensional space enclosing the to be integrated volume $S$, apply Boolean indexing, and calculate the mean value.
Exercise 6
In this exercise we are going to calculate the mass of a 3D sphere with linearly increasing density. This problem is quite challenging to do analytically, but with Monte Carlo integration it is relatively easy.
Let's first start with the simple problem of determining the mass of a sphere with uniform density $\rho_0=1.5$ kg/m$^3$. The sphere has a radius of 1 meter, and we initially integrate it with $N=100$ points.
###
### Your code here
###
Exercise 7
Now we make it a bit more difficult by adjusting the code. Consider a density that is a linear function of the radius according to $\rho(r)=\rho_0 r$.
Calculate the mass of the sphere for the case of the linear increasing density using the methodology developed in exercise 6. With some effort we can calculate the analytical answer as 1.5$\pi$ to check our answer.
###
### Your code here
###
The Monte Carlo integration error scales as $N^{-1/2}$ no matter what the dimensionality of the problem is. If we would have performed the integration with the simple sum rule the error would be $O(h)\propto O(1/N)$ , which on first look may indicate that this is more accurate than Monte Carlo integration. However, this accuracy is only for a single dimension.
For a cubic volume we have only $N^{1/3}$ points along every direction. Hence, the simple sum integration error in three dimensions scales as $N^{-1/3}$, which is worse than with Monte Carlo integration. Especially for problems with high dimensionality (D) the use of Monte Carlo integration is more advantageous than conventional numerical integration as, for simple sum integration, the accuracy scales as $N^{-1/D}$, whereas Monte-Carlo integration always scales as $N^{-1/2}$.
These integrations occur regularly in statistical mechanics.