43.5. 📥 Exercise: Linear algebra operations with NumPy#

NumPy is the library that gives Python its ability to work with numerical data at speed. This exercise will demonstrate how to use NumPy for common tasks such as the creation and manipulation of arrays, linear algebra operations, and statistical operations. Some of the content is inspired by one (out of many) cheat sheets found on github.

The structure of this note:

  1. N-dimensional arrays

  2. Array shape manipulations

  3. Linear algebra and numerical operations

  4. Array manipulations routines ( select and split)

  5. Statistical operations

import numpy as np

N dimensional arrays#

What are arrays?#

Arrays are a data structure for storing elements of the same type. Each item stored in an array is called an element. Each location of an element in an array has a numerical index which is used to identify the element. Arrays (and the indices) can be of more than one dimension.

1D vs 2D Array#

1D array (i.e., single dimensional array) stores a list of variables of the same data type. It is possible to access each variable using the index.

2D array (i.e, multi-dimensional array) stores data in a format consisting of rows and columns. Each index will then be two numbers.

Arrays vs Lists#

  • Arrays use less memory than lists

  • Arrays have significantly more functionality

  • Arrays require data to be homogeneous; lists do not

  • Arithmetic on arrays operates like matrix multiplication

  • NumPy is used to work with arrays. The array object in NumPy is called ndarray.

Create Arrays#

# Create 1 dimensional array (vector)
np.array([1,2,3]) # Create vector as a row
np.array([[1],[2],[3]]) #Create vector as a column. This is now a two-dimensional array.
np.array([(1,2,3),(4,5,6)]) # Two-dimensional array mith multiple rows and columns.

Zero-based indexing

Note that elements in ndarrays are accessed with zero-based indexing, meaning that the numbering starts with zero.

Row-major order

Note that elements in ndarrays are stored in memory using row-major order.

In computing, row-major order and column-major order are methods for storing multidimensional arrays in linear storage such as random access memory.

The difference between the orders lies in which elements of an array are contiguous in memory. In row-major order, the consecutive elements of a row reside next to each other, whereas the same holds true for consecutive elements of a column in column-major order.

Data layout is critical for correctly passing arrays between programs written in different programming languages. It is also important for performance when traversing an array because modern CPUs process sequential data more efficiently than nonsequential data.

Creating a Sparse Matrix#

A sparse matrix is a matrix in which most of the elements are zero. Sparse matrix data formats only store nonzero elements and assume all other values will be zero, leading to significant computational savings.

Let’ say, we want to create a NumPy array with two nonzero values, then converted it into a sparse matrix. If we view the sparse matrix, we can see that only the nonzero values are stored:

from scipy import sparse
matrix_large = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                         [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                         [3, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
# Create compressed sparse row (CSR) matrix
matrix_large_sparse = sparse.csr_matrix(matrix_large)
print(matrix_large_sparse)

In the example above, (1, 1) and (2, 0) represent the indices of the non-zero values 1 and 3, respectively. For example, the element 1 is in the second row and second column (remember that numbering starts at zero!).

Create special ndarray#

Here are some examples of numpy methods that can be used to create special arrays.

# Create  1d array of zeros, length 3
np.zeros(3) 
# 2x3 array of zeros
np.zeros((2,3))
# Create array of ones, 3x4 (3 rows, 4 columns)
np.ones((3,4))
#  5x5 array of 0 with 1 on diagonal (Identity matrix)
np.eye(5)
# Array of 6 evenly divided values from 0 to 100
np.linspace(0,100,6)
# Array of values from 0 to less than 10 with step 3 
np.arange(0,10,3)
# 2x3 array with all values 5
np.full((2,3),5)
# 2x3 array of random floats between 0–1
np.random.rand(2,3)
# 2x3 array of random floats between 0–100
np.random.rand(2,3)*100
# 2x3 array with random ints between 0–4
np.random.randint(5,size=(2,3))

Remember that useful shift-Tab-Tab shortcut in Jupyter notebooks to quickly find the docstring of a function. You can try it on any of the functions shown above.

You can also use the np.info function.

np.info(np.eye) # View documentation for np.eye

Array shape manipulations#

Shape, size, length and data type#

It will be valuable to check the shape and size of an array both for further calculations and simply as a sanity check after some operation.

NumPy arrays have an attribute called shape that returns a tuple with each index having the number of corresponding elements.

arr = np.array([(1,2,3),(4,5,6)])
arr.shape # Returns dimensions of arr (rows,columns)

In the example above, (2, 3) means that the array has 2 dimensions, and each dimension has 3 elements.

arr.size # Return number of elements in arr
len(arr) # Number of array dimension
arr.dtype # Return type of elements in arr
arr.dtype.name # Name of data type
arr_int = arr.astype(float) # Convert an array to a different type
arr_int.dtype

Reshape#

It is important to know how to reshape the NumPy arrays so that our data meets the expectation of specific Python libraries. For example, Scikit- learn requires a one-dimensional array of output variables to be shaped like a two-dimensional array with one column and outcomes for each row.

Other algorithms might require input to be specified as a three-dimensional array. One example is the Long Short-Term Memory recurrent neural network in Keras that expects a three-dimensional input of samples, timesteps, and features.

np.reshape() allows us to restructure an array so that we maintain the same data but it is organized as a different number of rows and columns.

Note: The shape of the original and new matrix contains the same number of elements (i.e, same size).

# Reshaping
arr1 = np.arange(1, 11)  # numbers 1 to 10
print(f'arr1 = {arr1}\n')
print(f'The shape of array `arr1` is: {arr1.shape}\n') # Prints a tuple for the one dimension.

arr1_2d = arr1.reshape(2, 5)
print(f'Reshaping the array into:\n{arr1_2d}')
print(f'The shape is now: {arr1_2d.shape}\n')

The reshape method has a useful option of specifying -1 as one of the shape elements. This will then automatically become the required number to maintain the array size.

arr1.reshape(2, 5)
arr1.reshape(-1, 5)  # same as above: arr1.reshape(2, 5)
arr1.reshape(2, -1)  # same as above: arr1.reshape(2, 5)

Transposing a matrix (rows <=> columns) or flattening into a one-dimensional array.

# Transpose
arr1_2d.T
# Flattening a Matrix
arr1_2d.flatten()

Note that the default of flatten is to use the row-major order. Can you find the argument to flatten with column-major instead?

# Resize arr1_2d to 3 rows, 4 columns 
np.resize(arr1_2d, (3,4))

Array to/from List conversion#

Arrays need to be declared whereas lists do not need declaration because they are a part of Python’s syntax. This is the reason lists are more often used than arrays. But in case of performing some arithmetic function to our list, we should go with arrays instead.

It is possible to convert between lists and arrays (although list to array conversion requires the object to be of a single type).

arr = np.array([(1,2,3),(4,5,6)])
print(arr)
type(arr)
arr_to_list = arr.tolist() # Convert arr to a Python list
print(arr_to_list)
type(arr_to_list)
arr_from_list = np.array(arr_to_list) # Convert list to array
print(arr_from_list)
type(arr_from_list)

The following operation will not work with a list, but does work with an array.

arr_to_list/4
arr_from_list/4

Linear algebra and numerical operations#

Let’s explore some of the numerical operations that can be performed with matrices. We will often use methods from the np.linalg module. Note that these NumPy linear algebra functions rely on BLAS and LAPACK to provide efficient low level implementations of standard linear algebra algorithms.

Trace#

The trace is the sum of all the diagonal elements of a square matrix.

arr = np.array([[2, 0, 0], [0, 2, 0], [0, 0, 2]])
np.trace(arr)

Inverse#

matrix = np.array([[1, 2],
                   [3, 4]])
# Calculate inverse of matrix
np.linalg.inv(matrix)

Determinant#

# Determinant
matrix = np.array([[1, 2, 3],
                   [2, 4, 6],
                   [3, 8, 9]])
# Return determinant of matrix
np.linalg.det(matrix)

Rank#

# Rank of a matrix
matrix = np.array([[1, 1, 3],
                   [1, 2, 4],
                   [1, 3, 0]])
# Return matrix rank
np.linalg.matrix_rank(matrix)

Question: Why is the rank of the 3x3 matrix below not equal to 3?

# Rank of a matrix
matrix = np.array([[1, 7, 3],
                   [1, 9, 4],
                   [1, 1, 0]])
# Return matrix rank
np.linalg.matrix_rank(matrix)

Eigenvalues and eigenvectors#

# Eigenvalues and Eigenvectors
matrix = np.array([[0, 1, 2],
                   [3, 4, 5],
                   [6, 7, 8]])
# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix)
print(f'Eigenvalues: {eigenvalues}')
print(f'Eigenvectors: {eigenvectors}')

Operating on matrices with special properties

Note that there are methods that are targeted to work on matrices with certain properties, such as np.linalg.eigh for complex Hermitian (conjugate symmetric) or a real symmetric matrix. Be careful, however, as it can produce incorrect results if applied to an object that does not have the required property.

Scalar operations#

When we add, subtract, multiply or divide a matrix by a number, this is called the scalar operation. During scalar operations, the scalar value is applied to each element in the array, therefore, the function returns a new matrix with the same number of rows and columns.

# Scalar Operations
new_arr = np.arange(1,10)

# Add 1 to each array element
np.add(new_arr,1)

np.subtract(arr,2) # Subtract 2 from each array element
np.multiply(arr,3) # Multiply each array element by 3
np.divide(arr,4) # Divide each array element by 4 (returns np.nan for division by zero)
np.power(arr,5) # Raise each array element to the 5th power

Matrix Operations#

A matrix can only be added to (or subtracted from) another matrix if the two matrices have the same dimensions, that is, they must have the same number of rows and columns.

When multiplying matrices, we take rows of the first matrix and multiply them by the corresponding columns of the second matrix.

arr1 = np.array([[1,2],[3,4]])
arr2 = np.ones_like(arr1)*2
print('arr1 =\n',arr1)
print('arr2 =\n',arr2)
print('Elementwise add arr2 to arr1:\n',np.add(arr1,arr2)) 
print('Elementwise subtract arr2 from arr1:\n',np.subtract(arr1,arr2)) 
print('Elementwise multiply arr1 by arr2:\n',np.multiply(arr1,arr2))
print('Elementwise divide arr1 by arr2:\n',np.divide(arr1,arr2))
print('Elementwise raise arr1 to the power of arr2:\n',np.power(arr1,arr2))
print('Return True if the arrays have the same elements and shape:',np.array_equal(arr1,arr2))

Other math operations

np.sqrt(arr) # Square root of each element in the array
np.sin(arr) # Sine of each element in the array
np.log(arr) # Natural log of each element in the array
np.abs(arr) # Absolute value of each element in the array
np.ceil(arr) # Rounds up to the nearest int
np.floor(arr) # Rounds down to the nearest int
np.round(arr) # Rounds to the nearest int

Dot product versus elementwise multiplication#

The dot product of two matrices can be computed with the np.dot function. Note that it is not the same as the arithmetic ∗ operation that performs elementwise multiplication (Hadamard product).

A = np.array([[1,2],[3,4]])
print('A =\n',A)
print(r'The dot product A \cdot A:')
AA = np.dot(A,A)
print(AA)
print(r'Element-wise product A * A:')
print(A*A)

The dot product of a matrix by its inverse returns the identity matrix (with small floating point errors). Verify that this is true.

Further examples with numpy

The NumPy tutorial is a good resource.

Array Manipulation Routines#

Adding/removing elements#

The np.append() function is used to append values to the end of a given array.

# Adding/removing Elements
np.append ([0, 1, 2], [[3, 4, 5], [6, 7, 8]])
np.append([[0, 1, 2], [3, 4, 5]],[[6, 7, 8]], axis=0)

The axis along which values are appended. If the axis is not given, both array and values are flattened before use.

np.insert()' is used to insert the element before the given index of the array.

np.insert(arr,2,10) # Inserts 10 into arr before index 2

While np.delete() can be used to delete any row and column from the ndarray.

arr = np.arange(12).reshape(3, 4)
print('arr =\n',arr)
np.delete(arr,2,axis=0) # Deletes row (axis=0) on index 2 of arr
np.delete(arr,3,axis=1) # Deletes column (axis=1) on index 3 of arr

Sorting#

The np.sort() function can be used to sort the list in both ascending and descending order.

# Sort
oned_arr = np.array([3,8,5,1])
np.sort(oned_arr)
arr = np.array([[5, 4, 6, 8],
                [1, 2, 4, 8],
                [1, 5, 2, 4]])
# sort each column of arr
np.sort(arr, axis=0)

# sort each row of X
np.sort(arr, axis=1)

Concatenate arrays#

Concatenating means putting contents of two or more arrays in a single array. In NumPy, we join arrays by axes. We pass a sequence of arrays that we want to join to the np.concatenate() function, along with the axis. If the axis is not explicitly passed, it is taken as 0.

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Adds arr2 as rows to the end of arr1
np.concatenate((arr1, arr2), axis=0)
arr1 = np.array([[1, 2, 3],[4, 5, 6]])
arr2 = np.array([[7, 8, 9],[10, 11, 12]])
# Adds arr2 as columns to end of arr1
np.concatenate((arr1,arr2),axis=1)

Splitting arrays#

We use np.array_split() for splitting arrays. We pass it the array we want to split and the number of splits.

# Splitting NumPy Arrays
# Splits arr into 4 sub-arrays
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 4) 
print(newarr)

or np.hsplit() to split horizontally.

arr = np.array([1, 2, 3, 4, 5, 6])
# Splits arr horizontally on the 2nd index
newarr = np.hsplit(arr, 2)
print(newarr)

Select element(s)#

NumPy offers a wide variety of methods for selecting and slicing elements or groups of elements in arrays.

user_name = np.array(['Katie','Bob','Scott','Liz','Sam'])
articles = np.array([100, 38, 91, 7, 25])
user_name[4] # Returns the element at index 4
articles[3] = 17 # Assigns array element on index 1 the value 4
user_name[0:3] # Returns the elements at indices 0,1,2
user_name[:2] # Returns the elements at indices 0,1
articles<50 # Returns an array with boolean values
articles[articles < 50] # Returns the element values
# Returns the user_name that read more than 50 articles but less than 100 articles
user_name[(articles < 100 ) & (articles >50)]
# Multi-dimensinal arrays
arr = np.arange(36).reshape(6,6)
print(arr)

Exercise: Try to perform the following selection or assignments of elements:

  • Returns the 2D array element on index [2][5]

  • Assigns array element on index [1][3] the value 10

  • Returns rows 0,1,2

  • Returns the elements on rows 0,1,2 at column 4

  • Returns returns rows 0,1

  • Returns the elements at index 1 on all rows

  • Returns all elements that are divisible by 3.

Solutions are found below

arr[2,5] # Returns the 2D array element on index [2][5]
arr[1,3]=10 # Assigns array element on index [1][3] the value 10
print(arr)
arr[0:3] # Returns rows 0,1,2
arr[0:3,4] # Returns the elements on rows 0,1,2 at column 4
arr[:2] # Returns returns rows 0,1
arr[:,1] # Returns the elements at index 1 on all rows
arr[np.mod(arr,3)==0] # returns all elements that are divisible by 3.

Statistical Operations#

Find the Maximum and Minimum Values#

Often we want to know the maximum and minimum value in an array or subset of an array. This can be accomplished with the np.max and np.min methods. Using the axis parameter we can also apply the operation along a certain axis.

articles = np.array([[10, 23, 17],
                   [41, 54, 65],
                   [71, 18, 89]])
# Return maximum element
np.max(articles)
np.max(articles, axis=0) # Find maximum element in each column
np.max(articles, axis=1) # Find maximum element in each row

Average, Variance, and Standard Deviation#

We can easily get descriptive statistics about the whole matrix or do calculations along a single axis.

# 4x6 array of random floats between 0–100
arr=np.random.rand(4,6)*100
print(arr)
np.mean(arr) # Returns the mean of all elements
arr.sum() # Returns the sum of all elements
np.mean(arr,axis=0) # Returns the mean along a specific axis
arr.cumsum(axis=1) #Cumulative sum of the elements
np.var(arr) # Returns the variance
np.std(arr,axis=1) # Returns the standard deviation along specific axis
np.corrcoef(arr) # Returns correlation coefficient of array