Good coding practices

In this notebook, we outline some examples of "good coding practices": things you should do to make your code robust and understandable to others.

Learning objectives:

  • Student is able to write code that has a clear and understandable structure
  • Student is able to write code with descriptive variable names
  • Student is able to write code with explanatory comments
  • Student is able to write code that avoids "hard coding"

Good coding practice: Writing understandable code

Once you become an expert at coding, it will become very easy to write code that python can understand.

What is actually a big challenge in computer science is actually writing code that works, that is efficient, and most importantly, that other people will understand!

You might think: but I'm writing this code only for myself, so I don't really care about if my code is easy to understand, right?

There are many reasons why this is incorrect:

  1. In this course, your teachers and your TAs will need to understand your code in order to grade it. If we can't quickly understand what you have done, this will affect your grade!!! (One of our grading criteria will be the clarity and readability of your code.)

  2. You might, in the future, want to share your code with someone else. If they have to spend a lot of time figuring out what you've done, then your code is useless to them. (In fact, this is the power of open languages like python, and is why python is so fantastically successful: there is a huge amount of code that other people have written that you can reuse!)

  3. Weeks, months, or even years later, you might want to go back and re-use your own code. If your code not written in a clear and understandable way, you yourself will waste a lot of time trying to figure out what you did!

In preparing the lecture notebooks for this course, I personally experienced 3 myself: I went back to some simple code that I wrote a few weeks earlier, and I realized I had no idea what the code did!

So how do you make sure that your code is understandable? There are two practices that can make this useful:

  • Meaningful variable names and logical structure
  • Comments explaining what you're doing

As a concrete example, I will take the following code below and show how by using these two techniques, an imcomprehensible piece of code that is them modified in two steps to be very easy to understand.

BAD code

Here is the "bad" code:

In [ ]:
foo3 = 2.0; foo1 = 10; foo2 = 0.0
def foo6(foo1):
    return foo1**2
foo5 = 0.5*foo6(foo2) + foo6(foo3); foo4 = (foo3-foo2)/foo1
for foo7 in range(1,foo1):
    foo5 += foo6(foo2+foo7*foo4)
foo83 = (foo5*foo4)

Can you figure out what this code does? It would take me personally a lot of time...

This code is a mess, but is technically correct: it will give the right answer. But correct code can also be terrible code, and for me, the code above is really terrible!

Better variable names and logical structure

In [ ]:
def f(x):
    return x**4 - 2*x + 1

N = 10
a = 0.0
b = 2.0
h = (b-a)/N

s = 0.5*f(a) + 0.5*f(b)
for k in range(1,N):
    s += f(a+k*h)

answer2 = (h*s)

Why is this better?

  • Not all variables and functions are named foo, but have somewhat meaningful names like f(x), a, b, N
  • The definition of the variables is in a logical order (a is after b, for example)
  • The functions are defined at the top (a common convention to make your code understandable)
  • There are blank lines separating different logical parts of the code:
    • function definitions at the top
    • then the defining the values of the input variables
    • then the loop that does the actual work
    • then the definition of the answer / output

When you work on the Integration notebook, you may also recognize this code from the trapezoidal technique. The variable names also match the formulas from a textbook we have used in the past, and if you had just read the textbook section on Integration, you will also probably almost instantly recognize all the variable names from those used in the derivation from the book, and also what it does.

However, if you come back at a later time when the integration section of the textbook is not so fresh in your head (as I did recently while updating this notebook), you may immediately think "what was I doing here?"

Even better: Descriptive variable names

The above code is already better. And if I am looking a textbook where all the things above are clearly explained, then maybe the meaning of the parameters a, b, N etc are clear to me. However, if you haven't seen the code before, it may not be immediately obvious. Are a and b the slope and intercept of a line $y=ax+b$? Or are they something else? Is N the number of points in my discretisation, or is it the number of slices in my integral?

For this reason, it is better to use descriptive variable names that themselves already describe what the meaning of the variable is:

In [ ]:
def f(x):
    return x**4 - 2*x + 1

N_slices = 10
start = 0.0
stop = 2.0
step_size = (stop-start)/N_slices

running_sum = 0.5*f(start) + 0.5*f(stop)

for i in range(1,N_slices):
    running_sum += f(start+i*step_size)

answer2 = (step_size*running_sum)

This is a bit more typing, but you can also use "tab completion" to save typing: just type the first few letters, push the "tab" key, and the Jupyter notebook will automatically type the rest out for you.

Commenting your code (Extreme example, probably too much commenting!)

By taking some time to explain what you're doing and why, this code can now become instantly understandable:

In [ ]:
# Code for integration with the Trapezoidal rule

# The function we want to integrate
def f(x):
    return x**4 - 2*x + 1

# The number of slices we will use, and the starting and end points of the integral
# Note that this is NOT the number of points for our discritisation: that is N_slices+1.
N_slices = 10

# The starting and stopping points of our integration range
start = 0.0
stop = 2.0

# The step size
step_size = (stop-start)/N_slices

# A running sum we will use
# We start by the half of the start and end points (trapezoidal rule)
running_sum = 0.5*f(start) + 0.5*f(stop)

# Now a for loop that does the sum.
# range(1,N_slices) will give us 9 numbers running from 1 to N_slices-1 
# For N_slices=10, we will get a list of 9 numbers: [1,2,3,4,5,6,7,8,9]
# Note that we do not need i = 0 because in the trapezoidal rule, we add it
# separately above with a factor of 0.5, together with the end point
for i in range(1,N_slices):
    running_sum += f(start+i*step_size)

# The integral is then given by the produce of the sum and the step siz
answer2 = (running_sum*step_size)

Here, we explain what we are doing and why! This code is likely understandable by anyone who reads it, including myself again in a year's time.

In particular, we can point out some of the sneaky things, such as the difference between number of slices and number of points, and also why our range() function starts at 1 and not 0.

A balanced level of commenting

It is always a good idea to add comments explaning what you do. But, of course, you don't want to write code that is more comments than actual code! Where do I draw the line? When do I decide if I should add a comment of if I think it is already clear enough without it?

Good guideline: if there is something that you had to fiddle around with for a while to get the code correct, add a comment in your code so that you remember this and others get to learn from your hard work!

Using descriptive variable names also helps, as then you may not have to add a comment where you otherwise would.

Below is a reasonable example where comments are used to point out some of the more things that might not be immediately obvious, but the variable names are well chosen such that the code is mostly understandable already directly by reading it. Note also that I have separated "logical blocks" of the code (the initial setting of parameters, the first part of the sum, then the for loop for the rest) by blank lines which makes it easier to visually see their different purposes.

In [ ]:
# Code for integration with the Trapezoidal rule

def f(x):
    return x**4 - 2*x + 1

N_slices = 10
start = 0.0
stop = 2.0
step_size = (stop-start)/N_slices

# First: half of the start and endpoints of the integration range
running_sum = 0.5*f(start) + 0.5*f(stop)

# Note sneaky: range(i,j) starts counting at i and ends at j-1
# (because python counts from zero...), which is coincidentally what we 
# want for the trapezoidal rule
for i in range(1,N_slices):
    running_sum += f(start+i*step_size)

answer2 = (running_sum*step_size)

Summary: Writing understandable code

In summary, you can make your code much more understandable for others if you:

  1. Use descriptive variable names
  2. Explain things in comments

Good coding practice: Avoid hard-coding

"Hard coding" is one of the coding practices we will be discouraging in this course, and the use of "hard coding" can lose you points in your final exam.

What is "hard coding"?

Hard coding is when you fill in values repeated at multiple places in your code. For example, say your are asked to make an array x that is 1000 points and runs from 0 to 10, calculate array $y = \sin(x)$, and then print out the percentage of points in y that have a value of less than 0.3. In the code snippets below, we will take a look at a BAD way of doing this with hard-coding, and then also proper ways of doing this that do not involve hard-coding.

BAD code with hard-coding

Here is an example of a BAD piece of code that does this using hard-coding of the number of points:

In [ ]:
import numpy as np

# An example of hard coding (DO NOT DO!!!!)
x = np.linspace(0,10,1000)
y = np.sin(x)

num = 0
for i in range(1000):
    if (y[i]<0.3):
        num += 1

print("The percentange of points in y less than 0.3 is %.2f %%" % (num/1000*100))

In this example code, we say that we have "hard coded" the number of points in the array. Say we wanted to then change the number of points in array x from 1000 to 1500: we would then have to go through all of our code by hand and change 1000 each time into 1500.

This time that is not so difficult since it is a short piece of code, but when you build more complex programs, this will become a lot more work, and the chance of getting a bug in your code will increase exponentially!

GOOD example with no hard-coding: replacing hard coded numbers with a variable

Below, instead of typing in 1000 each time manually, we will define a variable at the top that will define how many points x has. This way, if we want to change the size of x, all we have to do is change this variable and everything will still work.

In [ ]:
# An example THE PROPER WAY, using a variable
npts = 1000
x = np.linspace(0,10,npts)
y = np.sin(x)
maxval=0.3

num = 0
for i in range(npts):
    if (y[i]<maxval):
        num += 1

print("The percentange of points in y less than 0.3 is %.2f %%" % (num/npts*100))

The big advantage here is that you can just change npts and all the code works immediately with no further changes!

Another GOOD option: replace hard coded numbers with automatically calculated values

Here below is also another option: instead of using a variable npts, we can also use the len() function to automatically calculate the length of array x:

In [ ]:
# An example using len()
npts=1000
x = np.linspace(0,10,npts)
y = np.sin(x)
maxval=0.3
num = 0

# For for loops, this is quite handy and easy to read
for i in range(len(y)):
    if (y[i]<maxval):
        num += 1

# However, for this line, I find the example above with npts a bit more readable: it is very obvious
# why I would divide by npts, but maybe not so immediately obvious why dividing by len(y)
# is the right thing to do...but it's fine, in particular if you add a comment explaining
# yourself.
print("The percentange of points in y less than 0.3 is %.2f %%" % (num/len(y)*100))

As mentioned in the code, this is quite common and probably always a good idea for the for loop: this way, you never accidentally loop over the end of the array!

(In python, this will either give an error, or, in the case of slicing, will give strange results since slicing applies "periodic" boundary conditions when indexing instead of giving an error...)

In the case of calculating the percentage, it is maybe a bit less obvious to a non-trained programmer that this is the right thing to do, in which case it is a good idea to add a comment to your code explaining what you're doing.

Summary: How to avoid hard coding

You will make your code much more robust and maintainable by avoiding hard coding of values if you:

  • Replace hard-coded parameter values with variables
  • Use functions that can automatically determine the appropriate value to use