Singular value decomposition (SVD) and Principal value analysis (PCA)

34.1. Singular value decomposition (SVD) and Principal value analysis (PCA)#

PCA belongs to a general class of methods that perform dimensionality reduction on data, where data is defined broadly. This can be thought of as reducing redundancy in characterizing the data. In the context of reduced basis emulators, the “proper orthogonal decomposition” or POD approach is closely analogous to PCA. SVD is a key tool in both PCA and POD methods.

A good reference for SVD is A Singularly Valuable Decomposition: The SVD of a Matrix.

Some parts of this section were adapted from Computational-statistics-with-Python.ipynb, which is from a course taught at Duke University.

As a warm-up: a reminder and a checkpoint question.

Reminder of linear algebra using the index form

Recall that

\[ (AB)^\intercal = B^\intercal A^\intercal \quad\mbox{and}\quad (A^\intercal)_{ji} = A_{ij} \]
\[ \chi^2 = [Y - A\thetavec]^\intercal\, \Sigmavec^{-1}\, [Y - A\thetavec] = (Y_i - A_{ij}\thetavec_j)(\Sigma^{-1})_{ii'}(Y_{i'}- A_{i'j'}\thetavec_{j'}) , \]

where \(i,i'\) run from \(1\) to \(N\) and \(j,j'\) run from one to \(p\), where the highest power term is \(x^{p-1}\).

  • Summed over \(i,i',j,j'\) because they each appear (exactly) twice.

  • \((\Sigma^{-1})_{ii'} \neq (\Sigma_{i,i'})^{-1}\)

  • Be sure you understand the indices on the leftmost term, remembering that the matrix expression has this term transposed.

  • We know \(\chi^2\) is a scalar because there are no free indices.

Checkpoint Question

Prove that the Maximum Likelihood Estimate (MLE) for \(\chi^2\) defined by

\[ \chi^2 = (\Yvec - \Amat\thetavec)^{\mathbf{\top}} \Sigmamat^{-1} (\Yvec - \Amat\thetavec) \]

is

\[ \thetavec_{\mathrm{MLE}} = (\AmatT \Sigmamat^{-1} \Amat)^{-1} (\AmatT \Sigmamat^{-1} \Yvec) \;. \]

Here \(\thetavec\) is a \(m\times 1\) matrix of parameters (i.e., there are \(m\) parameters), \(\Sigmamat\) is the \(m\times m\) covariance matrix, \(\Yvec\) is a \(N\times 1\) matrix of observations (data), and \(\Amat\) is an \(N\times m\) matrix

\[\begin{split} \Amat = \left( \begin{array}{cccc} 1 & x_1 & x_1^2 & \cdots \\ 1 & x_2 & x_2^2 & \cdots \\ \vdots & \vdots & \vdots &\cdots \\ 1 & x_N & x_N^2 & \cdots \end{array} \right) \end{split}\]

where \(N\) is the number of observations. The idea is to do this with explicit indices for vectors and matrices, using the Einstein summation convention.