5.2. Two-dimensional PDFs#

Visualizing correlated Gaussian distributions#

A multivariate Gaussian distribution for \(N\) dimensional \(\boldsymbol{x} = \{x_1, \ldots, x_N\}\) with \(\boldsymbol{\mu} = \{\mu_1, \ldots, \mu_N\}\), with positive-definite covariance matrix \(\Sigma\) is

\[ p(\boldsymbol{x}|\boldsymbol{\mu},\Sigma) = \frac{1}{\sqrt{\det(2\pi\Sigma)}} e^{-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^\intercal\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})} \]

For the one-dimensional case, it reduces to the familiar

\[ p(x_1|\mu_1,\sigma_1) = \frac{1}{\sqrt{2\pi\sigma_1^2}} e^{-\frac{(x_1-\mu_1)^2}{2\sigma_1^2}} \]

with \(\Sigma = \sigma_1^2\).

For the bivariate case (two dimensional),

\[\begin{split} \boldsymbol{x} = \pmatrix{x_1\\ x_2} \quad\mbox{and}\quad \boldsymbol{\mu} = \pmatrix{\mu_1\\ \mu_2} \quad\mbox{and}\quad \Sigma = \pmatrix{\sigma_1^2 & \rho_{12} \sigma_1\sigma_2 \\ \rho_{12}\sigma_1\sigma_2 & \sigma_2^2} \quad\mbox{with}\ 0 < \rho_{12}^2 < 1 \end{split}\]

and \(\Sigma\) is positive definite.

Widget user interface features:

  • Set the mean position \((\mu_1, \mu_2)\) and variances \((\Sigma_{11}, \Sigma_{22})\) with the sliders

  • Set the correlation \(\rho_{12}\) with the slider. This controls the covariance \(\Sigma_{12} = \rho_{12} \sqrt{ \Sigma_{11} \Sigma_{22}}\).

  • Four presets are available.

  • The corner plot shows samples from the bivariate PDF and histograms for the two marginal distributions. Control the number of samples with the slider.

  • Dashed lines on the marginals mark the 16th, 50th, and 84th percentiles. This is equivalent to \(\pm 1\sigma\) for a one-dimensional Gaussian.

The solid and dashed ellipses in the joint panel are iso-probability levels. These correspond to fixed values of the squared Mahalanobis distance

\[ \Delta^2 = (\mathbf{x} - \boldsymbol{\mu})^\top \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}). \]

In the widget they are drawn at \(\Delta = 1\) and \(\Delta = 2\). Their semi-axes are aligned with the eigenvectors of \(\Sigma\) and have lengths \(\sqrt{\lambda_i}\) (the “\(1\sigma\)” ellipse) and \(2\sqrt{\lambda_i}\) (the “\(2\sigma\)” ellipse), where \(\lambda_i\) are the eigenvalues of \(\Sigma\).

A subtlety worth highlighting

In one dimension the \(\pm 1\sigma\) and \(\pm 2\sigma\) intervals contain 68.3% and 95.4% of the probability mass. In two dimensions, the corresponding ellipses contain considerably less, because \(\Delta^2 \sim \chi^2_2\) and

\[ \prob(\Delta^2 \le k^2) = 1 - e^{-k^2/2}. \]

Ellipse

Probability mass (2D)

For comparison: 1D interval

\(\Delta = 1\) (“\(1\sigma\)”)

39.3%

68.3%

\(\Delta = 2\) (“\(2\sigma\)”)

86.5%

95.4%

\(\Delta = 3\) (“\(3\sigma\)”)

98.9%

99.7%

Questions to consider:

  1. What does “positive definite” mean and why is this a requirement for the covariance matrix \(\Sigma\)?

  2. What is plotted in each part of the graph (called a “corner plot”)?

  3. What effect does changing \(\mu_1\) or \(\mu_2\) have?

  4. What effect does changing \(\Sigma_{11}\) or \(\Sigma_{22}\) have? What if the scales for \(x_1\) and \(x_2\) were the same?

  5. What happens if \(\rho_{12}\) is equal to \(0\) then \(+0.7\) then \(-0.7\).

  6. What would happen if you were allowed to set \(|\rho_{12}| \leq 1\)? Explain what goes wrong.

  7. So what characterizes independent (uncorrelated) variables versus positively correlated versus negatively correlated?

2D PDF with a quadratic approximation#

Consider a two-dimensional log likelihood \(L(X,Y)\). We’ll analyze it in a quadratic approximation. First, find the mode \(X_0\), \(Y_0\) (best estimate) by differentiating

\[\begin{split}\begin{align} L(X,Y) &= \log p(X,Y|\{\text{data}\}, I) \\ \quad&\Longrightarrow\quad \left.\frac{dL}{dX}\right|_{X_0,Y_0} = 0, \ \left.\frac{dL}{dY}\right|_{X_0,Y_0} = 0 \end{align}\end{split}\]

To check reliability, Taylor expand around \(L(X_0,Y_0)\):

\[\begin{split}\begin{align} L &= L(X_0,Y_0) + \frac{1}{2}\Bigl[ \left.\frac{\partial^2L}{\partial X^2}\right|_{X_0,Y_0}(X-X_0)^2 + \left.\frac{\partial^2L}{\partial Y^2}\right|_{X_0,Y_0}(Y-Y_0)^2 \\ & \qquad\qquad\qquad + 2 \left.\frac{\partial^2L}{\partial X\partial Y}\right|_{X_0,Y_0}(X-X_0)(Y-Y_0) \Bigr] + \ldots \\ &\equiv L(X_0, Y_0) + \frac{1}{2}Q + \ldots \end{align}\end{split}\]

It makes sense to do this in (symmetric) matrix notation:

\[\begin{split} Q = \begin{pmatrix} X-X_0 & Y-Y_0 \end{pmatrix} \begin{pmatrix} A & C \\ C & B \end{pmatrix} \begin{pmatrix} X-X_0 \\ Y-Y_0 \end{pmatrix} \end{split}\]
\[ \Longrightarrow A = \left.\frac{\partial^2L}{\partial X^2}\right|_{X_0,Y_0}, \quad B = \left.\frac{\partial^2L}{\partial Y^2}\right|_{X_0,Y_0}, \quad C = \left.\frac{\partial^2L}{\partial X\partial Y}\right|_{X_0,Y_0} \]
posterior ellipse

So in a quadratic approximation, the contour \(Q=k\) for some \(k\) is an ellipse centered at \(X_0, Y_0\) (as in the figure). The orientation and eccentricity are determined by \(A\), \(B\), and \(C\). The principal axes are found from the eigenvectors of the Hessian matrix \(\begin{pmatrix} A & C \\ C & B \end{pmatrix}\):

\[\begin{split} \begin{pmatrix} A & C \\ C & B \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \lambda \begin{pmatrix} x \\ y \end{pmatrix} \quad\Longrightarrow\quad \lambda_1,\lambda_2 < 0 \ \mbox{so $(x_0,y_0)$ is a maximum} \end{split}\]

If the major and minor axes of the ellipse are aligned with the \(x\)-axis and \(y\)-axis (so \(C=0\)), the analysis is simple: the eigenvalues are \(A\) and \(B\) and the error-bars for \(X_0\) and \(Y_0\) will be inversely proportional to the modulus of their square roots. What if the ellipse is skewed? See Sivia section 3.3 [SS06] for a thorough treatment.