33.2. Acquisition functions#
We will consider two different acquisition functions:
Expected Improvement (EI)
Lower Confidence Bound (LCB)
Note that we abbreviate the notation below and write \(\mathcal{A}(\mathbf{\theta}) \equiv \mathcal{A}(\mathbf{\theta}| D)\).
Expected Improvement#
The expected improvement acquisition function is defined by the expectation value of the rectifier \({\rm max}(0,f_{\rm min} - f(\mathbf{\theta}))\), i.e. we reward any expected reduction of \(f\) in proportion to the reduction \(f_{\rm min} - f(\mathbf{\theta})\). This can be evaluated analytically
where
indicates the density function of the normal distribution, whereas the standard normal distribution and the cumulative distribution function are denoted \(\phi\) and \(\Phi\), respectively, and we dropped the explicit dependence on \(\mathbf{\theta}\) in the third step.
In the last step we write the result in the standard normal variable \(z=\frac{f_{\rm min}-\mu}{\sigma}\). BayesOpt will exploit regions of expected improvement when the term \(z \Phi(z)\) dominates, while new, unknown regions will be explored when the second term \(\phi(z)\) dominates. For the expected improvement acquisition function, the exploration-exploitation balance is entirely determined by the set of observed data \(\mathcal{D}_n\) and the \(\mathcal{GP}\) kernel.
Note 1: Density function of the normal distribution: \(\mathcal{N}(\theta|\mu,\sigma^2) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left( -\frac{1}{2\sigma^2}(\theta-\mu)^2\right)\)
Note 2: Density function of the standard normal distribution: \(\phi(z) \equiv \mathcal{N}(z|\mu=0,\sigma^2=1) = \frac{1}{\sqrt{2 \pi}}\exp\left( -\frac{1}{2}z^2\right)\)
Note 3: Cumulative distribution function of the standard normal: \(\Phi(z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z}\exp\left(-\frac{t^2}{2}\right)\, dt\)
Lower Confidence Bound#
The lower confidence-bound acquisition function introduces an additional parameter \(\beta\) that explicitly sets the level of exploration
The maximum of this acquisition function will occur for the maximum of the \(\beta\)-enlarged confidence envelope of the \(\mathcal{GP}\). We use \(\beta=2\), which is a very common setting. Larger values of \(\beta\) leads to even more explorative BayesOpt algorithms.