14.6. Computing the Bayesian evidence#
There are many possible challenges in calculating the evidence, including
The likelihood may be sharply peaked in the prior range, but could have long tails and significant contributions to the required integrals;
The likelihood could be multimodal;
The posterior may only be significant on thin “sheets” in parameter space (cf. visualization of sampling).
Trotta [Tro08] gives a summary of methods (which is somewhat out-of-date in places):
Thermodynamic integration \(\longrightarrow\) simulated annealing. The computational cost depends heavily on dimensionality of parameter space and on details of likelihood function. For example, cosmological applications require up to \(10^7\) likelihood evaluations (100 times MCMC-based parameter estimation). A solution is to use parallel tempering (more to follow!).
Nested sampling recasts multidimensional evidence integral into a one-dimensional integral, which is easy to evaluate numerically. Generall this takes \(\sim 10^5\) likelihood evaluations.
multinestand newer versions are more efficient still.Approximations to the Bayes factor:
If models are nested: ask whether a new parameter is supported by data.
Laplace approximation may be good but be careful of priors.
Define the effective number of parameters (see BDA3 [GCS+13])
AIC, BIC, DIC, WAIC (summary to follow; see BDA3 for details)
The paper “Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC” by Vehtari, Gelman, and Gabry is a good (and reliable) source for theoretical and practical details on assessing and comparing the predictive accuracy of different models. Quote: “Cross-validation and information criteria are two approaches to estimating out-of-sample predictive accuracy using within-sample fits.” The computations use the log-likelihood evaluated at posterior simulations of the parameters.