Numerical Integration¶

In this notebook, you will implement the different techniques for the numerical computation of integrals in python and explore aspects of numerical integration.

Learning objectives: After finishing this notebook, you should be able to:

Implement integration of a function with the trapezoidal rule
Implement integration of a function with Simpson's rule
Calculate the integral of a numerical dataset

# Initialisation code for the notebook
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('resource/asnlib/public/')
sys.path.append('../answer_checking/')
from validate_answers import *

plt.rcParams['figure.dpi'] = 100

Numerical Integration¶

Often in physics, we will want to calculate the integral of a function, which can be interpreted at the area under the curve of the function between a starting point and and ending point, as illustrated here:

Unlike taking a derivative, calculating the analytical form of a derivative is not always easy, and is not even guaranteed to have an analytical closed form! This is one of the reasons it is useful to calculate integrals numerically, know as Numerical Integration:

https://en.wikipedia.org/wiki/Numerical_integration

Discretisation¶

As is the case for any calculation in a computer, the first step is to break our continuous variable $x$ into list of numbers x[i] separated by (typically) fixed steps (know as discretising), along with corresponding set of $y$ values $y[i]$:

We will name the spacing between the steps in $\Delta x = h$. The example above would correspond to a total number of points $n = 4$, and following the python convention of counting from zero, the last point is then y[3].

From there, we can use several different approximations to calculate the integrated area, a few of which we will describe here.

Simple sum¶

The simplest is to simply sum up all the y[i] values except the last and then multiply this by the spacing $h$. For a total number of points $n$, this results in an estimate of the integral $I_{sum}$ given by:

$$ I_{\rm sum} = h \sum_{i=0}^{n-2} y_i $$

This is equivalent to the following area graphically as an approximation of the continuous integral:

Note that the sum limit in the formula above does not include the last point ($i = n-1$), but instead stops at second-to-last point ($i=n-2$). This is because for $n$ points, there are $n-1$ blocks ("slices") between the first and the last points.

Trapezoidal rule¶

An obvious improvement over the sum rule above is to replace the "flat tops" of the boxes above with sloped lines that connect one point to the next:

This is known as the Trapezoidal rule:

https://en.wikipedia.org/wiki/Trapezoidal_rule

We can already see that this is giving a much better approximation! The formula for the estimated integral using the trapezoidal rule is given by:

$$ I_{\rm trap} = h \left( \frac{y_0 + y_{n-1}}{2} + \sum_{i=1}^{n-2} y_i \right) $$

The formula is in fact nearly identical to the simple sum, except that instead of using the first point and skipping the last point, we replace the first point with average of the first and last points.

Simpson's rule¶

While the trapezoidal rule is clearly much better, we can do even better if we use a quadratic interpolation instead of a linear interpolation between the points. This technique is known as Simpson's rule:

https://en.wikipedia.org/wiki/Simpson%27s_rule

In this case, the estimate of the integral is given by a more complicated sum involving different coefficients for the odd and even points:

$$ I_{\rm simp} = \frac{h}{3}\left( y_0 + 2\sum_{i=1}^{n/2-1}y_{2i} + 4\sum_{i=1}^{n/2}y_{2i-1} + y_n \right) $$

In the exercises below, you will explore implementing these different types of numerical intergration techniques in python.

Exercises¶

We will start by calculating the following integral:

\begin{equation} \int_0^2 (x^4 - 2x +1) dx \end{equation}

using two different techniques.

This integral is actually one that you can calculate yourself and has the exact value of 4.4. This will be handy as we can use this exact value to compare with the values that our numerical calculations give.

(The idea, of course though, is that numerical integration is useful for calculating the integrals of functions that are not easy to calculate! But we will do this later, and for now, it is handy to know what the answer is.)

Before we start, it is useful already to make a plot of the function we are integrating.

To do this, we will create an array x ranging from 0 to 2 with 100 points (just to get a bit of a smooth plot). We will then create second array y which will be our integrand, and then we will plot y versus x.

Exercise 1: Modify the code below to produce a plot of the integrand in the range of 0 to 2 with 100 points.

# Define the function f(x) 
###
### Your code here
###


# First, make the arrays of x and y values for the plot
# npts = ______
# x = np.linspace(____,____,npts)
# y = _____

###
### Your code here
###


# Now plot it and add x and y labels

# plt.plot(___,___)
# ...

###
### Your code here
###


# A zero axis line is handy. "ls" is a shortcut for "linestyle" and "lw" is a shortcut for "linewidth"
plt.axhline(0,color='grey', linestyle=':')

answer_4_01_1 = np.copy(x)
answer_4_01_2 = np.copy(y)

question = "answer_4_01"
num = 2

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Self check:

Does your plot have axis labels?
Does the function look reasonably like you might think?

The function looks pretty reasonable: it starts going down, it crosses through zero, and then shoots up. It has some "positive" area above the x-axis, and in the middle, it has a little bit of "negative" area that decreases the value of the integral.

The Trapezoidal Rule¶

A simple method of calculating an integral is using the trapezoidal rule.

Exercise 2: Write code to calculate the integral using the trapezoidal rule with $n = 11$ points in the range (10 "steps").

###
### Your code here
###


# Your answer should be stored in a variable trapezoidal_integral
answer_4_02_1 = trapezoidal_integral

question = "answer_4_02"
num = 1

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Using the code, you can play with $n$ to see how many points we need to get a reasonable accuracy.

Benchmarking the Trapezoidal Rule¶

In general, you will often be using your code to calculate an integral for which you do not know the answer. However, in this case, since we do know the answer, you can get a feeling for how accurate your code is by comparing your calculated value to the correct answer.

This sort of "benchmarking" is an integral part of developing numerical simulation software. In particular, you can always find a way for your code to give you "an answer". But how can you trust that it is the correct answer? How do you know how much influence the approximations you have made have on the calculated result?

For this, it is always a good idea to use your code to perform a calculation for a case where you know the answer. This will allow you to play around to see how to optimize your code to be more accurate (and also to find bugs!). Of course, getting an accurate answer for one type of calculation does not guarantee that your code will be accurate for all calculations, but by trying it out for different types of calculations, you can start to get a feeling for how it performs and build up some trust that it will give you an accurate answer.

Here, we will explore such a "benchmarking" of the Trapezoidal rule for calculating integrals, and explore the number of steps needed to achieve a given accuracy.

Exercise 3: Use a while loop to find the minimum value of $N$ you need to get the correct answer to relative error of less than $10^{-6}$ = one part per million (ppm).

The definition of relative error is as follows: if $v$ is the correct answer and $v_{\rm approx}$ is the approximate answer, the relative error $\eta$ is defined as:

\begin{equation} \eta = \left| \frac{v - v_{\rm approx}}{v} \right| \end{equation}

Your while loop should have an "emergency exit" test that stops the loop with a break statement if $N$ exceeds 10,000.

Tip: If you have trouble with your exit condition on your while loop, it might be handy to include a code line print("N %d eta %e" % (N,eta)) to keep track of what is going on in your code. This is an elementary form of debugging.

###
### Your code here
###


answer_4_03_1 = N
answer_4_03_2 = integral
answer_4_03_3 = eta

question = "answer_4_03"
num = 3

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Simpson's Rule¶

Simpson's rule is a numerical integration technique that replaces the linear approximation of the integrand in the trapizoidal technique with a a "best estimate" quadratic approximation.

Exercise 4: Write code to implement Simpson's rule for the integral in section 1.1 using 11 points (10 steps).

# Write your Simpson's rule code here

###
### Your code here
###


print("Integral with Simpson's rule is %f" % integral_simpson)

answer_4_04_1 = integral_simpson

question = "answer_4_04"
num = 1

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Which technique (Simpson vs. Trapezoidal) is more accurate for the same number of points?

Integrating numerical data¶

In the above, we have focussed on integrating analytical functions. However, the same techniques we have introduced can also be used to integrate numerical data.

One difference, however, is that we will not evaluate the values of the integrand by a function call, as we did above, but instead by looking up it's value in an array.

For this you will load data from a file resource/asnlib/public/velocities.dat included already on your notebook server. The file contains two columns of numbers, the first representing time $t$ in seconds and the second the $x$-velocity in meters per second of a particle, measured once every second from time $t=0$ to $t=100$ seconds.

Exercise 5: Read in the data and, using the trapezoidal rule, calculate from them the approximate distance traveled by the particle in the $x$ direction as a function of time.

Since this exercise is also about learning to program, you are forbidden from using built-in scipy or numpy functions!

Hint This is a cumulative integral, a bit different than the definite integrals handled above. Your integral code should produce an array not a number! If f(t) is the function describing the velocity as a function of time, then the answer g(t) is given by:

$$ g(t) = \int_0^t h(\tau) d\tau $$

Every element in your output array is then conceptually defined by computing an integral from $\tau=0$ to $\tau = t$.

(Of course, there may be cleaver and more computationally efficient ways to do it as well, but that is not the focus right now...)

Recommendation "Modularize" your code by creating a function that does the trapezoidal integration of an array, this will make your code easier to write and use.

# Download the resource files.
import urllib
urllib.request.urlretrieve("http://www-personal.umich.edu/~mejn/cp/data/velocities.txt", 
                           filename="velocities.txt")

# Load the data
###
### Your code here
###



# A function for calculating the trapezoidal integral of an array:
def trapezoid(x):
    ###
### Your code here
###


#y=[v[0],v[1]] 
#z=trapezoid(y)

 
# Now calculate the cumulative integral: 
###
### Your code here
###




answer_4_05_1 = np.copy(d)

question = "answer_4_05"
num = 1

to_check = [question + "_%d" % (n+1) for n in range(num)]
feedback = ""
passed = True
for var in to_check:
    res, msg = check_answer(eval(var), var)
    passed = passed and res
    print(msg); feedback += msg + "n"

assert passed == True, feedback

Exercise 6: Make a plot with velocity on the left-hand y-axis and distance travelled in the right-hand y-axis.

To plot data with a second y-axis, you need a bit more matplotlib code:

# First, initialize your plot with the plt.subplots() command, which returns an axis object you will need
fig, ax1 = plt.subplots()

# You then use the twinx() function to make a second axis object with it's x-axis linked to the original x-axis
ax2 = ax1.twinx()

# Now you can use the plot() commands of each axis objects: ax.plot(...). 
ax1.plot(t,d, 'g-', label="d")
ax1.set_ylabel("my axis (my units)", color='g')
ax2.plot(...)

You can also use the axhline() command to add horizontal lines. For example, this will add a grey dashed horizontal line:

ax2.axhline(0, color='grey', linestyle=":")

###
### Your code here
###

Self check:

Does your plot have axis labels?
Are the correct functions plotted?
Is it clear which line belongs to which axis? (Either with a legend or with color axis labels? Or both?)

Numpy and Scipy integration functions¶

Now that we have some understanding of how numerical integration works, it is also useful to note that there are several numpy functions that can calculate such integrals for you!

Simple sum:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html

Trapezoidal rule:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html

Cumulative sum:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.cumsum.html

Numpy does not include Simpson's rule (since for small $h$ the trapezoidal rule is often accurate enough), but in case you want to use it, it is available in the scipy package:

Simpson's rule:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simps.html

Because all of the functions above work with vectorization (see notebook 2b), they can be hundreds of times faster for very large arrays, so it is useful to use them for large calculations.