42. 📥 Exercise: Jupyter Notebooks and Python#
Try it yourself with the Jupyter notebook. Note that exercises such as this one correspond to a Jupyter Notebook that you can run yourself (rather than just looking at the static version shown here). You are highly encouraged to download the source files of the lecture notes such that you can perform the exercises interactively.
A Jupyter Notebook is a powerful tool for interactively developing and presenting data and computational science projects. Jupyter Notebooks integrate markdown text notes, mathematical equations, interactive code cells and their output into a single document. They can be displayed on a web browser running on a computer, on a tablet (e.g., IPad), or even on your smartphone.
This step-by-step workflow promotes fast, iterative development since each output of your code will be displayed right away. That’s why notebooks have become increasingly popular in data science and for explorative computational projects.
For serious code development projects, however, the Jupyter Notebook format does have some serious drawbacks and should be avoided.
This exercise offers a tour of the use of Python in Jupyter notebooks to get started doing data analysis. You’ll learn more features and details as you proceed.
The Jupyter notebook Help menu. You can find valuable documentation under the Jupyter notebook Help menu. The “User Interface Tour” and “Keyboard Shortcuts” are useful places to start, but there are also many other links to documentation there.
42.1. Code and Markdown cells#
The notebook consists of cells, of which two types are relevant for us:
Markdown cells: These have headings, text, and mathematical formulas in \(\LaTeX\) using a simple form of HTML called markdown.
Code cells: These have Python code (or other languages, but we’ll stick to Python).
Either type of cell can be selected with your cursor and will be highlighted in color when active. You evaluate an active cell with shift-return (as with Mathematica) or by pressing Run on the toolbar. Some notes:
When a new cell is inserted, by default it is a Code cell and will have
[]:in front. You can type Python expressions or entire programs in a cell. How you break up code between cells is your choice and you can always put Markdown cells in between. When you evaluate a cell it gets the evaluation number, e.g.,[5]:.The output of an evaluated cell will be presented in an Output cell that is also preceeded by, e.g.,
[5]:where5is the evaluation number. This cell cannot be edited. It will be updated if the cell is evaluated again.On the menu bar is a pulldown menu that lets you change back and forth between Code and Markdown cells. Once you evaluate a Markdown cell, it gets formatted (and has a blue border). To edit the Markdown cell, double click in it.
Try double-clicking on this cell and then shift-return. You will see that a bullet list is created just with an asterisk and a space at the beginning of lines (without the space you get italics and with two asterisks you get bold).
Double click on the title header at the start of this exercise Notebook and you’ll see it starts with a single #. Headings of subsections are made using ## or ###.
The Markdown language See this Markdown cheatsheet for a quick tour of the Markdown language (including how to add links!).
Now try turning the next (empty) cell to a Markdown cell and type: Einstein says $E=mc^2$ and then evaluate it. This is \(\LaTeX\)! (If you forget to convert to Markdown and get SyntaxError: invalid syntax, just select the cell and convert to Markdown with the menu.)
42.3. Python expressions and strings#
We can use the Jupyter notebook as a super calculator much like Mathematica and Matlab. Try some basic operations, modifying and evaluating the following cells, noting that exponentiation is with ** and not ^.
1 + 1 # Everything after a number sign / pound sign / hashtag)
# in a code cell is a comment
3.2 * 4.713
Note that if we want a floating point number (which will be the same as a double in C++), we always include a decimal point (even when we don’t have to) while a number without a decimal point is an integer.
3.**2
We can define integer, floating point, and string variables, perform operations on them, and print them. Note that we don’t have to predefine the type of a variable and we can use underscores in the names (unlike Mathematica). Evaluate the following cells and then try your own versions.
x = 5.
print(x)
x # If the last line of a cell returns a value, it is printed.
y = 3.*x**2 - 2.*x + 7.
print('y = ', y) # Strings delimited by ' 's
There are several ways to print strings that includes variables from your code. We recommend using the relatively newly added fstring. See, e.g., this blog for examples.
print(f'y = {y:.0f}') # Just a preview: more on format later
print(f'y = {y:.2f}') # ...showing more decimals
The fstring will be used predominantly in this course, but you might also encounter older formatting syntax.
print('x = %.2f y = %.2f' %(x,y))
print('x = {0:.2f} y = {1:.2f}'.format(x, y))
print(f'x = {x:.2f} y = {y:.2f}')
first_name = 'Emilia' # Strings delimited by ' 's
last_name = 'Student'
full_name = first_name + ' ' + last_name # you can concatenate strings
print(full_name)
# or
print(f'{first_name} {last_name}')
42.4. Mathematical functions with numpy#
Ok, how about square roots and trigonometric functions and …
(Note: the next two cells will give error messages — keep reading to see how to fix them.)
sqrt(2)
sin(pi)
We need to import these functions through the NumPy library. There are other choices, but NumPy works with the arrays that we often will use.
Import libraries into their own namespace.
Never use
from numpy import *
as it will eventually lead to conflicts in your namespace of functions.
Instead you should import libraries into their own namespace through
import numpy as np.
Here np is just a abbreviation for numpy (which we can choose to be anything, but np is conventional).
import numpy as np
print(np.cos(0.))
Now functions and constants like np.sqrt and np.pi will work. Go back and fix the square root and sine.
Debugging aside …#
Suppose you try to import and it fails
go ahead and evaluate the cell:
import numpie
When you get a ModuleNotFoundError, the first thing to check is whether you have misspelled the name. Try using Google, e.g., search for “python numpie”. In this case (and in most others), Google will suggest the correct name (here it is numpy). If the name does exist, check whether it sounds like the package you wanted.
If you have the correct spelling, check whether you have installed the relevant package. If you installed Python with conda, then use conda list, e.g., conda list numpy in a Terminal window (on a Mac or Linux box) or in an Anaconda Prompt window (on a Windows PC).
numpy arrays#
The NumPy library is extremely useful and its linear algebra features will be covered in more depth in a separate exercise Notebook. However, here we already introduce numpy arrays, which will be featured extensively. They are similar to Python lists but much more powerful for scientific computations. Here, as an example, we construct them with np.arange(min, max, step) to get an array from min to max in steps of step. Examples:
t_pts = np.arange(0., 10., .1)
t_pts
If we give a numpy array to a function, each term in the list is evaluated with that function:
x = np.arange(1., 5., 1.)
print(x)
print(x**2)
print(np.sqrt(x))
We can pick out elements of the list. Why does the last one fail?
print(x[0])
print(x[3])
print(x[4])
Zero-based numbering. Note the use of zero-based numbering in Python.
42.5. Getting help#
You will often need help identifying the appropriate Python (or NumPy or SciPy or …) command or you will need an example of how to do something or you may get an error message you can’t figure out. In all of these cases, Google (or equivalent) is your friend. Always include “python” in the search string (or “numpy” or “matplotlib” or …) to avoid getting results for a different language. You will usually get an online manual as one of the first responses if you ask about a function; these usually have examples if you scroll down. Otherwise, responses to Stack Overflow queries are your best bet to find a useful answer.
42.6. Functions#
There are many Python language features that we will use eventually. We will definitely need to be able to create functions. Note that these are usually referred to as methods in Python to stress that they don’t need to correspond to mathematical operations that return some output that is computed given some input. Sometimes a method just performs some tasks such as creating and saving a figure. We will use both names (functions and methods) in these notes.
When defining functions (methods), we note the role of indentation in Python as opposed to {}s or ()s used in other programming languages. We’ll always indent four spaces (never tabs!). We know a function definition is complete when the indentation stops. The Jupyter Notebook also works as some sort of IDE (integrated development environment) as it will assist you with proper indentation and formating.
Shift-Tab help text.
To read the manual (docstring) of a Python function, put your cursor on the function name and hit Shift+Tab or Shift+Tab+Tab. Go back and try it on np.arange.
# Use "def" to create new functions.
# Note the colon and indentation (4 spaces).
def my_function(x):
"""This function squares the input. Always include a brief description
at the top between three starting and three ending quotes. We will
talk more about proper documentation later.
Try shift+Tab+Tab after you have evaluated this function.
"""
return x**2
print(my_function(5.))
# We can pass an array to the function and it is evaluated term-by-term.
x_pts = np.arange(1.,10.,1.)
print(my_function(x_pts))
# Two variables, with a default for the second
def add(x, y=4.):
"""Add two numbers."""
print("x is {} and y is {}".format(x, y))
return x + y # Return values with a return statement
# Calling functions with parameters
print('The sum is ', add(5, 6)) # => prints out "x is 5 and y is 6" and returns 11
# Another way to call functions is with keyword arguments
add(y=6, x=5) # Keyword arguments can arrive in any order.
How do you explain the following result?
add(2)
Debugging aside …#
There are two bugs in the following function. Note the line where an error is first reported and fix the bugs sequentially (so you see the different error messages).
def hello_function()
msg = "hello, world!"
print(msg)
return msg
42.7. Plotting with Matplotlib#
Matplotlib is the plotting library we’ll use. We’ll follow convention and abbreviate the module we need as plt.
The %matplotlib inline is a Jupyter Notebook built-in magic command that produces inline plots, i.e. the plot is shown in the Output cell.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
Procedure we’ll use to make the skeleton plot:
Generate some data to plot in the form of arrays.
Create a figure;
add one or more subplots;
make a plot and display it.
t_pts = np.arange(0., 10., .1) # step 1.
x_pts = np.sin(t_pts) # More often this would be from a function
# *we* write.
my_fig = plt.figure() # step 2.
my_ax = my_fig.add_subplot(1,1,1) # step 3: rows=1, cols=1, 1st subplot
my_ax.plot(t_pts, x_pts) # step 4: plot x vs. t
NOTE: When making just a single plot, you will more usually see steps 2 to 4 compressed into plt.plot(t_pts, np.sin(t_pts)). Don’t do this. It saves a couple of lines but restricts your ability to easily extend the plot, which is what we want to make easy.
We can always go back and dress up the plot:
my_fig = plt.figure()
my_ax = my_fig.add_subplot(1,1,1) # nrows=1, ncols=1, first plot
my_ax.plot(t_pts, x_pts, color='blue', linestyle='--', label='sine')
my_ax.set_xlabel('t')
my_ax.set_ylabel(r'$\sin(t)$') # here $s to get LaTeX and r to render it
my_ax.set_title('Sine wave')
# here we'll put the function in the call to plot!
my_ax.plot(t_pts, np.cos(t_pts), label='cosine') # just label the plot
my_ax.legend(); # turn on legend
Now make two subplots:
y_pts = np.exp(t_pts) # another function for a separate plot
fig = plt.figure(figsize=(10,5)) # allow more room for two subplots
# call the first axis ax1
ax1 = fig.add_subplot(1,2,1) # one row, two columns, first plot
ax1.plot(t_pts, x_pts, color='blue', linestyle='--', label='sine')
ax1.plot(t_pts, np.cos(t_pts), label='cosine') # just label the plot
ax1.legend()
ax2 = fig.add_subplot(1,2,2) # one row, two columns, second plot
ax2.plot(t_pts, np.exp(t_pts), label='exponential')
ax2.legend();
Saving a figure#
Saving a figure to disk is as simple as calling savefig with the name of the file (or a file object). The available image formats depend on the graphics backend you use.
Let us save the figure (named ‘fig’) from the previous cell
fig.savefig("sine_and_exp.png")
# and a transparent version:
fig.savefig("sine_and_exp_transparent.png", transparent=True)
Further examples with matplotlib. The matplotlib gallery is a good resource for learning by working examples.
42.8. Advanced feature: Widgets for graphical exploration#
A widget is an object such as a slider or a check box or a pulldown menu. We can use them to make it easy to explore different parameter values in a problem we’re solving, which is useful for building intuition. They act on the argument of a function.
The set of widgets we’ll use here (there are others!) is from ipywidgets; we’ll conventionally import the module as import ipywidgets as widgets and we’ll also often use display from Ipython.display.
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display
%matplotlib inline
The simplest form is to use interact, which we pass a function name and the variables with ranges. By default this makes a slider, which takes on integer or floating point values depending on whether you put decimal points in the range. Try it! Then modify the function and try again.
# We can do this to any function
def test_f(x=5.):
"""Test function that prints the passed value and its square.
Note that there is no return value in this case."""
print ('x = ', x, ' and x^2 = ', x**2)
widgets.interact(test_f, x=(0.,10.));
# Explicit declaration of the widget (here FloatSlider) and details
def test_f(x=5.):
"""Test function that prints the passed value and its square.
Note that there is no return value in this case."""
print ('x = ', x, ' and x^2 = ', x**2)
widgets.interact(test_f,
x = widgets.FloatSlider(min=-10,max=30,step=1,value=10));
Here’s an example with some bells and whistles for a plot. Try making changes!
def plot_it(freq=1., color='blue', lw=2, grid=True, xlabel='x',
function='sin'):
""" Make a simple plot of a trig function but allow the plot style
to be changed as well as the function and frequency."""
t = np.linspace(-1., +1., 1000) # linspace(min, max, total #)
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(1,1,1)
if function=='sin':
ax.plot(t, np.sin(2*np.pi*freq*t), lw=lw, color=color)
elif function=='cos':
ax.plot(t, np.cos(2*np.pi*freq*t), lw=lw, color=color)
elif function=='tan':
ax.plot(t, np.tan(2*np.pi*freq*t), lw=lw, color=color)
ax.grid(grid)
ax.set_xlabel(xlabel)
widgets.interact(plot_it,
freq=(0.1, 2.), color=['blue', 'red', 'green'],
lw=(1, 10), xlabel=['x', 't', 'dog'],
function=['sin', 'cos', 'tan'])