Expectation values and moments

4.4. Expectation values and moments#

We have put on the table the axioms of probability theory and some of their consequences, in particular Bayes’ theorem. Before looking further at concrete applications of Bayesian inference, we provide further insight into Bayes’ theorem in Review of Bayes’ theorem and introduce some additional ingredients for Bayesian inference in Data, models, and predictions. The latter include the idea of a statistical model, how to predict future data conditioned on (i.e., given) past data and background information (the posterior predictive distribution), and Bayesian parameter estimation.

In Appendix A there is a summary and further details on Statistics concepts and notation. Particularly important are Expectation values and moments and Central moments: Variance and Covariance; we summarize the key discrete and continuous definitions here. Note: there are multiple notations out there for these quantities!

Brief summary of expectation values and moments

The expectation value of a function \(h\) of the random variable \(X\) with respect to its distribution \(p(x_i)\) (a PMF) or \(p(x)\) (a PDF) is

\[ \mathbb{E}_{p}[h] = \sum_{i}\! h(x_i)p(x_i) \quad\Longrightarrow\quad \mathbb{E}_p[h] = \int_{-\infty}^\infty \! h(x)p(x)\,dx . \]

The \(p\) subscript is usually omitted. Moments correspond to \(h(x) = x^n\), with \(n=0\) giving 1 (this is the normalization condition) and the mean \(\mu\) by \(n=1\):

\[ \mathbb{E}[X] \equiv \mu = \sum_{i}\! x_ip(x_i) \quad\Longrightarrow\quad \mathbb{E}[X] \equiv \mu = \int_{-\infty}^\infty \! xp(x)\,dx \equiv \langle x \rangle \equiv \bar x , \]

where we have also indicated two other common notations for the mean.

The variance and covariance are moments with respect to the mean for one and two random variables:

\[\begin{split}\begin{align} \text{Var}(X) &\equiv \sigma_{X}^2 \equiv \mathbb{E}\left[ \left( X - \mathbb{E}[X] \right)^2 \right] \\ \text{Cov}(X,Y) &\equiv \sigma_{XY}^2 \equiv \mathbb{E}\left[ \left( X - \mathbb{E}[X] \right) \left( Y - \mathbb{E}[Y] \right) \right]. \end{align}\end{split}\]

The standard deviation \(\sigma\) is simply the square root of the variance \(\sigma^2\). The correlation coefficient of \(X\) and \(Y\) (for non-zero variances) is

\[ \rho_{XY} \equiv \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}, \]

The covariance matrix \(\Sigma_{XY}\) is

\[\begin{split} \Sigma_{XY} = \pmatrix{\sigma_X^2 & \sigma_{XY}^2 \\ \sigma_{XY}^2 & \sigma_Y^2} = \pmatrix{\sigma_X^2 & \rho_{XY} \sigma_X\sigma_Y \\ \rho_{XY}\sigma_X\sigma_Y & \sigma_Y^2} \quad\mbox{with}\ 0 < \rho_{XY}^2 < 1 . \end{split}\]

Checkpoint question

Show that we can also write

\[ \sigma^2 = \mathbb{E}[X^2] - \mathbb{E}[X]^2 \]

Checkpoint question

Show that the mean and variance of the normalized Gaussian distribution

\[ p \longrightarrow \mathcal{N}(x | \mu,\sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} \exp{\Bigl(-\frac{(x-\mu)^2}{2\sigma^2}\Bigr)}, \]

are \(\mu\) and \(\sigma^2\), respectively.

Covariance matrix for a bivariate (two-dimensional) normal distribution

With vector \(\boldsymbol{x} = \pmatrix{x_1\\ x_2}\), the distribution is

\[ p(\boldsymbol{x}|\boldsymbol{\mu},\Sigma) = \frac{1}{\sqrt{\det(2\pi\Sigma)}} e^{-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^\intercal\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})} \]

with mean and covariance matrix:

\[\begin{split} \boldsymbol{\mu} = \pmatrix{\mu_1\\ \mu_2} \quad\mbox{and}\quad \Sigma = \pmatrix{\sigma_1^2 & \rho_{12} \sigma_1\sigma_2 \\ \rho_{12}\sigma_1\sigma_2 & \sigma_2^2} \quad\mbox{with}\ 0 < \rho_{12}^2 < 1 . \end{split}\]

Note that \(\Sigma\) is symmetric and positive definite. See the 📥 Visualizing correlated Gaussian distributions notebook for plotting what this looks like.

Checkpoint question

What can’t we have \(\rho > 1\) or \(\rho < -1\)?