Skip to main content
Ctrl+K
Learning from data - Home
  • About this Jupyter Book

Course overview

  • Objectives

Topics

  • 1. Basics of Bayesian statistics
    • 1.1. Lecture 1
    • 1.2. Exploring PDFs
    • 1.3. Checking the sum and product rules, and their consequences
    • 1.4. Lecture 2
    • 1.5. Interactive Bayesian updating: coin flipping example
    • 1.6. Standard medical example by applying Bayesian rules of probability
    • 1.7. Lecture 3
    • 1.8. Parameter estimation example: Gaussian noise and averages I
    • 1.9. Radioactive lighthouse problem
    • 1.10. Lecture 4: A couple of frequentist connections
    • 1.11. Visualization of the Central Limit Theorem
  • 2. Bayesian parameter estimation
    • 2.1. Lecture 5: Parameter estimation
    • 2.2. Parameter estimation example: fitting a straight line
    • 2.3. Lecture 6
    • 2.4. Amplitude of a signal in the presence of background
    • 2.5. Linear Regression and Model Validation demonstration
    • 2.6. Assignment: Follow-ups to Parameter Estimation notebooks
    • 2.7. Linear Regression exercise
    • 2.8. Linear algebra games including SVD for PCA
    • 2.9. Follow-up: fluctuation trends with # of points and data errors
  • 3. MCMC sampling I
    • 3.1. Lecture 7
    • 3.2. Metropolis-Hasting MCMC sampling of a Poisson distribution
    • 3.3. Lecture 8
    • 3.4. Parameter estimation example: Gaussian noise and averages II
    • 3.5. Exercise: Random walk
    • 3.6. Overview: MCMC Diagnostics
    • 3.8. Assignment: 2D radioactive lighthouse location using MCMC
  • 4. Why Bayes is better
    • 4.1. Lecture 9
    • 4.2. A Bayesian Billiard game
    • 4.3. Lecture 10
    • 4.4. Parameter estimation example: fitting a straight line II
    • 4.5. Lecture 11
    • 4.6. Error propagation: Example 3.6.2 in Sivia
    • 4.7. Building intuition about correlations (and a bit of Python linear algebra)
    • 4.8. Lecture 12
    • 4.9. Lecture 13
    • 4.10. Dealing with outliers
  • 5. Model selection
    • 5.1. Lecture 14
    • 5.2. Lecture 15
    • 5.3. Evidence calculation for EFT expansions
    • 5.4. Lecture 16
    • 5.5. Example: Parallel tempering for multimodal distributions
    • 5.6. Example: Parallel tempering for multimodal distributions vs. zeus
  • 6. MCMC sampling II
    • 6.1. Lecture 17
    • 6.2. Quick check of the distribution of normal variables squared
    • 6.3. Liouville Theorem Visualization
    • 6.4. Solving orbital equations with different algorithms
    • 6.5. Lecture 18
    • 6.6. PyMC Introduction
    • 6.7. Introductory Overview of PyMC
    • 6.8. Comparing samplers for a simple problem
    • 6.9. zeus: Sampling from multimodal distributions
  • 7. Gaussian processes
    • 7.1. Lecture 19
    • 7.2. Gaussian processes demonstration
    • 7.3. Learning from data: Gaussian processes
    • 7.4. Exercise: Gaussian Process models with GPy
    • 7.5. Lecture 20
    • 7.6. Gaussian Processes regression: basic introductory example
    • 7.7. Illustration of prior and posterior Gaussian process for different kernels
    • 7.8. Extra: Reduced Basis Method Emulators
  • 8. Assigning probabilities
    • 8.1. Lecture 21
    • 8.2. Ignorance pdfs: Indifference and translation groups
    • 8.3. MaxEnt for deriving some probability distributions
    • 8.4. Maximum Entropy for reconstructing a function from its moments
    • 8.5. Making figures for Ignorance PDF notebook
  • 9. Machine learning: Bayesian methods
    • 9.1. Lecture 22
    • 9.2. Bayesian Optimization
    • 9.3. Lecture 23
    • 9.4. What Are Neural Networks?
    • 9.5. Feed-forward neural network for a function in PyTorch
    • 9.6. Neural networks
    • 9.7. Bayesian neural networks
    • 9.8. Lecture 24
    • 9.9. Neural network classifier demonstration
    • 9.10. Variational Inference: Bayesian Neural Networks
    • 9.11. What is a convolutional neural network?
  • 10. PCA, SVD, and all that
    • 10.1. Lecture 25
    • 10.2. Linear algebra games including SVD for PCA

Mini-projects

  • Mini-project I: Parameter estimation for a toy model of an EFT
  • Mini-project IIa: Model selection basics
  • Mini-project IIb: How many lines?
  • Mini-project IIIa: Bayesian optimization
  • Mini-project IIIb: Bayesian Neural Networks

Reference material

  • Bibliography
  • Using Anaconda
  • Using GitHub
  • Python and Jupyter notebooks
    • Python and Jupyter notebooks: part 01
    • Python and Jupyter notebooks: part 02
    • Simple widgets
  • Examples: Jupyter jb-book
  • Related topics
    • Student t from Gaussians
    • QBism

Notebook keys

  • Checking the sum and product rules, and their consequences Key
  • Standard medical example by applying Bayesian rules of probability Key
  • Radioactive lighthouse problem Key
  • Binder
  • Repository
  • Open issue
  • .ipynb

Follow-up: fluctuation trends with # of points and data errors

Contents

  • A. Parameter estimation example: fitting a straight line
    • First make tables
    • Now make a function for rerunning
    • Now make linear and log-log plots

2.9. Follow-up: fluctuation trends with # of points and data errors#

This is a follow-up to Assignment: Follow-ups to Parameter Estimation notebooks, which focuses on the trends of fluctuations and how to visualize them.

A. Parameter estimation example: fitting a straight line#

  2.   Do exercise 3: “Change the random number seed to get different results and comment on how the maximum likelihood results fluctuate. How are size of the fluctuations related to the number of data points \(N\) and the data error standard deviation \(dy\)? (Try changing them!)”

The size of the fluctuations decrease as the square root of the number of points N. As the data error standard deviation increases, the size of the fluctuations increases linearly with dy.

How do we obtain, visualize, and understand these results?

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn; seaborn.set('talk') # for plot formatting
from scipy import optimize

def make_data(intercept, slope, N=20, dy=5, rseed=None):
    """Given a straight line defined by intercept and slope:
          y = slope * x + intercept
       generate N points randomly spaced points from x=0 to x=100
       with Gaussian (i.e., normal) error with mean zero and standard
       deviation dy.
       
       Return the x and y arrays and an array of standard deviations.
    """
    rand = np.random.RandomState(rseed) 
    x = 100 * rand.rand(N)  # choose the x values randomly in [0,100]
    y = intercept + slope * x  # This is the y value without noise
    y += dy * rand.randn(N)    # Add in Gaussian noise
    return x, y, dy * np.ones_like(x)  # return coordinates and error bars

def log_likelihood(theta, x, y, dy):
    y_model = theta[0] + theta[1]*x
    return -0.5*np.sum(np.log(2*np.pi*dy**2)+ (y - y_model)**2/dy**2)

def minfunc(theta, x, y, dy):
    """
    Function to be minimized: minus the logarithm of the likelihood.
    """
    return -log_likelihood(theta, x, y, dy)

First make tables#

intercept = 25.   # true intercept (called b elsewhere)
slope = 0.5       # true slope (called m elsewhere)
theta_true = [intercept, slope]  # put parameters in a true theta vector

iterations = 10
# Fix dy and vary Npts geometrically
dy_data = 5
for Npts in [20, 80, 320]:
    print(f'N = {Npts}, dy = {dy_data}')
    print('          intercept   slope')
    print(f'true:       {intercept:.3f}    {slope:.3f}')
    
    for i in np.arange(iterations):        
        x, y, dy = make_data(*theta_true, N=Npts, dy=dy_data, rseed=None)
        result = optimize.minimize(minfunc, x0=[0, 0], args=(x, y, dy))
        intercept_fit, slope_fit = result.x
    
        print(f'dataset {i}:  {intercept_fit:.3f}    {slope_fit:.3f}')
    print('------------------------------\n')
# Fix Npts and vary dy geometically
Npts = 80
for dy_data in [1, 5, 25]:
    print(f'N = {Npts}, dy = {dy_data}')
    print('          intercept   slope')
    print(f'true:       {intercept:.3f}    {slope:.3f}')
    
    for i in np.arange(iterations):        
        x, y, dy = make_data(*theta_true, N=Npts, dy=dy_data, rseed=None)
        result = optimize.minimize(minfunc, x0=[0, 0], args=(x, y, dy))
        intercept_fit, slope_fit = result.x
    
        print(f'dataset {i}:  {intercept_fit:.3f}    {slope_fit:.3f}')
    print('------------------------------\n')

Now make a function for rerunning#

def std_of_fit_data(Npts, dy_data, iterations, theta_true=theta_true):
    """Calculate the standard deviations of the slope and intercept 
       for a given number of iterations
    """ 
    intercept_fits = np.zeros(iterations)
    slope_fits = np.zeros(iterations)

    for j in np.arange(iterations):        
        x, y, dy = make_data(*theta_true, N=Npts, dy=dy_data, rseed=None)
        result = optimize.minimize(minfunc, x0=[0, 0], args=(x, y, dy))
        intercept_fits[j], slope_fits[j] = result.x
        
    return intercept_fits.std(), slope_fits.std()    
std_of_fit_data(20, 5, 20)
std_of_fit_data(80, 5, 20)
std_of_fit_data(320, 5, 20)

Now make linear and log-log plots#

Which is better? How do you read a power law from a log-log plot?

Npts_array = [20 * 2**i for i in range(10)]
Npts_array
# Fix dy and vary Npts geometrically
dy_data = 5

Npts_array = [20 * 2**j for j in range(10)]
intercept_std_array = np.zeros(len(Npts_array))
slope_std_array = np.zeros(len(Npts_array))

iterations = 20
for i, Npts in enumerate(Npts_array):
    intercept_std_array[i], slope_std_array[i] = std_of_fit_data(Npts, dy_data, iterations)   

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

axes[0,0].set_title(r'intercept fluctuations vs $N$')
axes[0,0].set_xlabel(r'$N$')
axes[0,0].plot(Npts_array, intercept_std_array)

axes[0,1].set_title(r'intercept fluctuations vs $N$')
axes[0,1].set_xlabel(r'$N$')
axes[0,1].loglog(Npts_array, intercept_std_array)
axes[0,1].set_xlim(10,1e4)
axes[0,1].set_ylim(.01,10)
axes[0,1].set_aspect('equal')

axes[1,0].set_title('slope fluctuations vs $N$')
axes[1,0].set_xlabel(r'$N$')
axes[1,0].plot(Npts_array, slope_std_array)

axes[1,1].set_title('slope fluctuations vs $N$')
axes[1,1].set_xlabel(r'$N$')
axes[1,1].loglog(Npts_array, slope_std_array)
axes[1,1].set_xlim(10,1e4)
axes[1,1].set_ylim(1e-3,1e-1)
axes[1,1].set_aspect('equal')

fig.tight_layout()
# Fix Npts and vary dy geometrically
Npts = 20

dy_array = [1 * 2**j for j in range(10)]
intercept_std_array = np.zeros(len(dy_array))
slope_std_array = np.zeros(len(dy_array))

iterations = 20
for i, dy_data in enumerate(dy_array):
    intercept_std_array[i], slope_std_array[i] = std_of_fit_data(Npts, dy_data, iterations)   

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

axes[0,0].set_title(r'intercept fluctuations vs $dy$')
axes[0,0].set_xlabel(r'$dy$')
axes[0,0].plot(Npts_array, intercept_std_array)

axes[0,1].set_title(r'intercept fluctuations vs $dy$')
axes[0,1].set_xlabel(r'$dy$')
axes[0,1].loglog(Npts_array, intercept_std_array)
axes[0,1].set_xlim(1e1,1e4)
axes[0,1].set_ylim(.1,1000)
#axes[0,1].set_aspect('equal')

axes[1,0].set_title('slope fluctuations vs $dy$')
axes[1,0].set_xlabel(r'$dy$')
axes[1,0].plot(Npts_array, slope_std_array)

axes[1,1].set_title('slope fluctuations vs $dy$')
axes[1,1].set_xlabel(r'$dy$')
axes[1,1].loglog(Npts_array, slope_std_array)
axes[0,1].set_xlim(1e1,1e4)
axes[1,1].set_ylim(1e-3,1e+1)
#axes[1,1].set_aspect('equal')

fig.tight_layout()
  1. In both sets of joint posterior graphs, are the slope and intercept correlated? How do you know? Does it make sense?

  2. For the first set of data, answer the question: “What do you conclude about how the form of the prior affects the final posterior in this case?”

  3. For the second set of data, answer the question: “Why in this case does the form of the prior have a clear effect?” You should consider both the size of the error bars and the number of data points (try changing them to explore the impact).

previous

2.8. Linear algebra games including SVD for PCA

next

3. MCMC Sampling I

Contents
  • A. Parameter estimation example: fitting a straight line
    • First make tables
    • Now make a function for rerunning
    • Now make linear and log-log plots

By Dick Furnstahl and Daniel Phillips

© Copyright 2024.