Singular value decomposition (SVD) and Principal value analysis (PCA)

34.1. Singular value decomposition (SVD) and Principal value analysis (PCA)#

PCA belongs to a general class of methods that perform dimensionality reduction on data, where data is defined broadly. This can be thought of as reducing redundancy in characterizing the data. In the context of reduced basis emulators, the “proper orthogonal decomposition” or POD approach is closely analogous to PCA. SVD is a key tool in both PCA and POD methods.

A good reference for SVD is A Singularly Valuable Decomposition: The SVD of a Matrix.

Some parts of this section were adapted from Computational-statistics-with-Python.ipynb, which is from a course taught at Duke University.

As a warm-up: a reminder and a checkpoint question.

Reminder of linear algebra using the index form

Recall that

\[ (AB)^\intercal = B^\intercal A^\intercal \quad\mbox{and}\quad (A^\intercal)_{ji} = A_{ij} \]

\[ \chi^2 = [Y - A\thetavec]^\intercal\, \Sigmavec^{-1}\, [Y - A\thetavec] = (Y_i - A_{ij}\thetavec_j)(\Sigma^{-1})_{ii'}(Y_{i'}- A_{i'j'}\thetavec_{j'}) , \]

where \(i,i'\) run from \(1\) to \(N\) and \(j,j'\) run from one to \(p\), where the highest power term is \(x^{p-1}\).

Summed over \(i,i',j,j'\) because they each appear (exactly) twice.
\((\Sigma^{-1})_{ii'} \neq (\Sigma_{i,i'})^{-1}\)
Be sure you understand the indices on the leftmost term, remembering that the matrix expression has this term transposed.
We know \(\chi^2\) is a scalar because there are no free indices.

Checkpoint Question

Prove that the Maximum Likelihood Estimate (MLE) for \(\chi^2\) defined by

\[ \chi^2 = (\Yvec - \Amat\thetavec)^{\mathbf{\top}} \Sigmamat^{-1} (\Yvec - \Amat\thetavec) \]

is

\[ \thetavec_{\mathrm{MLE}} = (\AmatT \Sigmamat^{-1} \Amat)^{-1} (\AmatT \Sigmamat^{-1} \Yvec) \;. \]

Here \(\thetavec\) is a \(m\times 1\) matrix of parameters (i.e., there are \(m\) parameters), \(\Sigmamat\) is the \(m\times m\) covariance matrix, \(\Yvec\) is a \(N\times 1\) matrix of observations (data), and \(\Amat\) is an \(N\times m\) matrix

\[\begin{split} \Amat = \left( \begin{array}{cccc} 1 & x_1 & x_1^2 & \cdots \\ 1 & x_2 & x_2^2 & \cdots \\ \vdots & \vdots & \vdots &\cdots \\ 1 & x_N & x_N^2 & \cdots \end{array} \right) \end{split}\]

where \(N\) is the number of observations. The idea is to do this with explicit indices for vectors and matrices, using the Einstein summation convention.

Hint

A suggested approach:

Write \(\chi^2\) in indices: \(\chi^2 = (Y_i - A_{ij}\theta_j)\Sigma^{-1}_{ii'}(Y_{i'}- A_{i'j'}\theta_{j'})\), where summations over repeated indices are implied (be careful of transposes and using enough independent indices). How do we see that \(\chi^2\) is a scalar?
Find \(\partial\chi^2/\partial \theta_k = 0\) for all \(k\), using \(\partial\theta_j/\partial\theta_k = \delta_{jk}\). Isolate the terms with one component of \(\thetavec\) from those with none.
You should get the matrix equation \( (\AmatT \Sigmamat^{-1} \Yvec) = (\AmatT \Sigmamat^{-1} \Amat)\thetavec\). At this point you can directly solve for \(\thetavec\). Why can you do this now?

Answer

We find the MLE from \(\partial\chi^2/\partial\thetavec_k = 0\) for \(k = 1,\ldots p\).

\[\begin{aligned} \left.\frac{\partial\chi^2}{\partial\thetavec_k}\right|_ {\thetavec=\thetavechat} &= -A_{ij}\delta_{jk}(\Sigma^{-1})_{i,i'}(Y_{i'} - A_ {i'j'}\thetavechat_{j'}) + (Y_{i} - A_{ij}\thetavechat_{j})(\Sigma^{-1})_{i,i'}(-A_ {i'j'}\delta_{j'k}) = 0 \end{aligned}\]

Isolate the \(\thetavec\) terms on one side and show the doubled terms are equal:

\[ A_{ik}(\Sigmavec^{-1})_{i,i'}Y_{i'} + Y_i (\Sigmavec^{-1})_{i,i'} A_{i'k} = A_{ik}(\Sigmavec^{-1})_{i,i'}A_{i'j'}\thetavechat_{j'} + A_{ij}\thetavechat_{j}(\Sigmavec^{-1})_{i,i'}A_{i'k} \]

In the second term on the left, switch \(i\leftrightarrow i'\) and use \((\Sigmavec^{-1})_{i',i} = (\Sigmavec^{-1})_{i,i'}\) because it is symmetric. This is then the same as the first term.
In the first term on the right, we switch \(j\leftrightarrow j'\) and use the symmetry of \(\Sigmavec\) again to show the two terms are the same.

Writing \(A_{ik} = (A^\intercal)_{ki}\), we get

\[\begin{aligned} 2(A^\intercal)_{ki} (\Sigmavec^{-1})_{i,i'} Y_i = 2 (A^{\intercal})_{ki} (\Sigmavec^{-1})_{i,i'} A_ {i'j}\thetavechat_j \end{aligned}\]

or, removing the indices,

\[ (A^{\intercal}\Sigmavec^{-1} Y) = (A^{\intercal}\Sigmavec^{-1}A) \thetavechat \]

and then inverting (which is possible because the expression in parentheses on the right is a square, invertible matrix), we finally obtain:

\[ \thetavechat = [A^\intercal \Sigmavec^{-1} A]^{-1} [A^\intercal \Sigmavec^{-1} Y] . \]

Q.E.D.