Skip to main content

Ctrl+K

Learning from data for physicists

Overview

1. Invitation to inductive inference
2. Introduction

Part I: Bayesian methods for scientific modeling

3. Overview of Part I
4. Inference and PDFs
5. More on PDFs
6. Updating via Bayes' rule
7. Error propagation
8. Bayes in practice
9. Exercises and problems for Part I

Part II: Advanced Bayesian methods

10. Overview of Part II
11. Assigning probabilities
12. Dealing with outliers
13. Bayes goes linear — History matching
- 13.1. Bayes linear methods
- 13.2. Iterative history matching
14. Model selection
15. Discrepancy Models with GPs
16. Model averaging and mixing

Part III: MCMC sampling

17. Overview of Part III
18. Intuition for MCMC
19. Details of MCMC
20. MCMC in practice
21. Advanced sampling algorithms
22. State-of-the-art sampling implementations

Part IV: Machine learning basics

23. Overview of Part IV
24. Machine learning overview
25. Logistic Regression
- 25.5. Machine Learning: First Examples
- 25.6. Exercise: Logistic Regression and Neural Networks
26. Artificial neural networks (ANNs)
27. *Convolutional neural nets
28. Problems for Part IV

Part V: Probabilistic machine learning

29. Overview of Part V
30. Bayesian neural nets
- 30.4. Demo: Variational Inference and Bayesian Neural Networks
31. Gaussian processes
32. ANNs in the large-width limit (ANNFT)
33. Bayesian Optimization
34. Dimensionality reduction and emulators
35. Problems for Part V
- 35.1. Bayesian neural networks

Backmatter

36. Bibliography
37. Guide to Jupyter Book markdown

Appendix A: Statistics

38. Notation and overview of statistics material
39. The probability measure
40. Working with probability distributions

Appendix B: Scientific modeling

41. Overview of scientific modeling material
42. Overview of modeling
43. Linear models
44. Mathematical optimization

Appendix C: Getting started

45. Overview of Getting started material
46. Setting up for interactive use of this Jupyter book
- 46.1. Using git for cloning the book repository
- 46.2. Setting up your Python enviroment
47. Jupyter notebooks and Python
48. Guides on Jupyter notebooks and Python

TALENT mini-projects

Overview of mini-projects
MP I: Parameter estimation for a toy model of an EFT
MP IIa: Model selection basics
MP IIb: How many lines?
- Mini-project IIb: How many lines are there?
MP IIIa: Bayesian optimization
MP IIIb: Bayesian Neural Networks

Repository
Open issue

.md

Criticality analysis

Contents

Variance
Excess kurtosis
The story so far

32.4. Criticality analysis#

Variance#

../../../_images/ANNFT_variance_ReLU_notes.png

../../../_images/ANNFT_variance_Softplus_notes.png

../../../_images/ANNFT_variance_plots2.png — Fig. 32.8 The final layer pre-training output variance as a function of neural network depth with a fixed width of 240. Results are given for a ReLU activation function tuned (a) below (\(C_W = 1.0\), \(C_b = 0.0\)), (c) at the critical point (\(C_W = 2.0\), \(C_b = 0.0\)), and (e) above (\(C_W = 3.0\), \(C_b = 0.0\)) and for a Softplus activation function with (b) lowest, (d) intermediate, and (f) highest initialization widths for the weights by using the same initialization hyperparameters as the ReLU. The measured variance is plotted in blue, a recursive calculation using empirical values of the variance is plotted as a dashed red line, and a recursive calculation using only theoretically calculated variances is plotted as a dashed pink line. When above/below the critical values of the initialization width, the variance explodes/vanishes with depth. When the ReLU network is tuned to critical initialization, the variance is fixed with depth. The Softplus activations do not have a critical point for initialization. As such, the Softplus variance either grows or asymptotes towards a constant value with depth, and does not have initialization hyperparameters that give a fixed variance with depth.#

Excess kurtosis#

../../../_images/ANNFT_excess_kurtosis_I.png

../../../_images/ANNFT_excess_kurtosis_II.png

../../../_images/ANNFT_excess_kurtosis_ReLU_notes.png

../../../_images/ANNFT_excess_kurtosis_Softplus_notes.png

../../../_images/ANNFT_kurtosis_plots2.png — Fig. 32.9 The final-layer pre-training-output (unstandardized) excess kurtosis (abbreviated here as EK, and also known as the non-Gaussian 4-point correlations) as a function of neural network depth with a fixed width of 240. Results are given for a ReLU activation function tuned (a) below (\(C_W=1.0\), \(C_b=0.0\)), (c) at the critical point (\(C_W=2.0\), \(C_b=0.0\)) and (e) above (\(C_W=3.0\), \(C_b=0.0\)), and for a Softplus activation function with (b) lowest, (d) intermediate, and (f) highest initialization widths for the weights by using the same initialization hyperparameters as ReLU. The measured EK is plotted as a green line, a recursive calculation of EK using empirical values of the variance and initial EK is plotted as a dashed red line, and a recursive calculation using only theoretical values is plotted as a dashed pink line. An additional blue line is present in (e), representing an exact calculation of the critical EK without treating \(1/n\) terms as subleading. When critically tuned, ReLU networks’ EK behaves linearly with depth as predicted by ANNFT, demonstrating that the higher-order moments of the distribution are controlled by an expansion in \(r\). Non-critical distributions are seen to grow non-linearly with depth, reflecting that criticality allows for a perturbative ANNFT that lets deeper network network behavior to be analyzed.#

The story so far#

../../../_images/ANNFT_story_so_far.png

previous

32.3. ANNFT: key ideas

next

32.5. Validating ANNFT

Contents

Variance
Excess kurtosis
The story so far

By Christian Forssén, Dick Furnstahl, and Daniel Phillips

© Copyright 2026.