1.4. Lecture 2#
Follow-ups to Lecture 1 notebooks#
Notes on filling in the table in Checking the sum and product rules, and their consequences using Python:
fstring:
print(f'ratio = {ratio:.3f}')
ornp.around(number,digits)
How do you use numpy arrays?
experts: write function to work with either ints or floats or numpy arrays
Further takeaways from Exploring PDFs to discuss in class
Bayesian confidence intervals: how are they defined?
various “point estimates” (mean, mode, median); which is “best” to use?
characteristics of different pdfs (e.g., symmetry, heavy tails, …)
what does “sampling” mean?
Question
How do you know if a distribution with a known functional form has been correctly sampled?
Answer
When the samples are histogrammed (and normalized), they approach the distribution function more closely as the number of samples increases.
what are projected posteriors (relate these to marginalization over one of the variables)?
Bayesian updating via Bayes’ theorem#
Consider Bayes’ theorem for the case where we seek the pdf of parameters
is a general vector of parameters (this is common notation in statistics)The denominator is the data probability or “fully marginalized likelihood” or evidence or maybe some other name (these are all used in the literature). We’ll come back to it later. As will be clear later, it is a normalization factor.
The prior pdf is what information
we have (or believe) about before we observe the data.The posterior pdf is our new pdf for
, given that we have observed the data.The likelihood is the probability of getting the specified data given the parameters
under consideration on the left side. Note that the likelihood is to be considered as a function of for fixed data.
Note
Sometimes particular notation is used for the prior and likelihood. The prior is written with
Coin tossing example to illustrate updating#
The notebook is Bayesian_updating_coinflip_interactive.ipynb.
Storyline:
We are observing successive flips of a coin (or any binary process). There is a definite true probability of getting heads
We characterize our information about
as a pdf.Before any flips, we start with a preconceived notion of the probability; this is the prior pdf
, where is any info we have.With each flip of the coin, we gain additional information, so we update our expectation of
by finding the posterior
Note that the outcome is discrete (either heads or tails) but
is continuous .
Let’s first play a bit with the simulation and then come back and think of the details.
Note a few of the Python features:
there is a class for data called Data. Python classes are easy compared to C++!
a function is defined to make a type of plot that is used repeatedly
elaborate widget
use it as a guide for making your own! (Optional!) Read from the bottom up in the widget definition to understand its structure.
Widget user interface features:
tabs to control parameters or look at documentation
set the true
by the sliderpress “Next” to flip “jump” # of times
plot shows updating from three different initial prior pdfs
Class exercises
Tell your neighbor how to interpret each of the priors
Possible answers
uniform prior: any probability is equally likely. Is this uninformative? (More later!)
centered prior (informative): we have reason to believe the coin is more-or-less fair.
anti-prior: could be anything but most likely a two-headed or two-tailed coin.
What is the minimal common information about
Answer
Things to try:
First one flip at a time. How do you understand the changes intuitively?
What happens with more and more tosses?
Try different values of the true
.
Question
What happens when enough data is collected?
Answer
All posteriors, independent of prior, converge to a narrow pdf including
Follow-ups:
Which prior(s) get to the correct conclusion fastest for
? Can you explain your observations?Does it matter if you update after every toss or all at once?
Suppose we had a fair coin
Is the sum rule obeyed?
Proof of penultimate equality
More generally,
The likelihood for a more general
But we want to know about
Note that the denominator doesn’t depend on
(it is just a normalization).
Claim: we can do the tossing sequentially or all at once and get the same result. When is this true?
Answer
When the tosses are independent.
What would happen if you tried to update using the same results over and over again?
So how are we doing the calculation of the updated posterior?
In this case we can do analytic calculations.
Case I: uniform (flat) prior#
Start with (1.7) but with the normalization
where we will suppress the “
and so evaluating the posterior for
Case II: conjugate prior#
Choosing a conjugate prior (if possible) means that the posterior will have the same form as the prior. Here if we pick a beta distribution as prior, it is conjugate with the coin-flipping likelihood. From the scipy.stats.beta documentation the beta distribution (function of
where
If the prior is
so we update the prior simply by changing the arguments of the beta distribution:
Check this against the code!
Look in the code where the posterior is calculated and see how the beta distribution is used. Verify that (1.13) is evaluated. Try changing the values of
The first updates explicitly
If the first toss is a head, then
so the prior just gets multiplied by a straight line from
Now suppose the next toss is a tail, so
so the prior gets multiplied by an inverted parabola peaked at
Try sketching this!
First look at the radioactive lighthouse problem#
This is from radioactive_lighthouse_exercise.ipynb.
A radioactive source emits gamma rays randomly in time but uniformly in angle. The source is at

Gamma rays are detected on the
Goal: Given the recorded positions
Naively, how would you estimate
Follow the notebook leading questions for the Bayesian estimate!