3.3. Lecture 8#
Recap of Poisson MCMC example#
Notebook: Metropolis-Hasting MCMC sampling of a Poisson distribution
Recall Metropolis algorithm for
start with inital point
(for each walker)begin repeat: given
, propose fromcalculate
:
where the left factor compares posteriors and the right factor is a correction for
decide whether to keep
:if
, set (accept)else draw
if
, (accept)else
(add another copy of to the chain)
end repeat when “converged”.
Key questions: When are you converged? How many “warm-up” or “burn-in “ steps to skip?
Poisson take-aways:
It works! Sampled historgram agrees with (scaled) exact Poisson pdf (that is what success looks like). But not normalized! Compare 1000 to 100,000.
Warm-up time is (apparently) seen from the trace. Moral: always check traces!
The trace also shows that the space is being explored (rather than being confined to one region).
What if the
step is not implemented as it should be? (I.e., so the chain is only incremented if the step is accepted.) This is not clearly intuitive. But see the figures below with 100,000 steps. The first one follows the Metropolis algorithm and adds the same step if the candidate is rejected; the second one does not. Not keeping the repeated steps invalidates the Markov chain conditions wrong stationary distribution (not by a lot but noticeably and every time it is run).
The spread of the means decreases as
, as expected.
Summary points from arXiv:1710.06068
Before considering another example, some summary points from “Data Analysis Recipes: Using Markov Chain Monte Carlo” by David Hogg and Daniel Foreman-Mackey.
Both are computational astrophysicists (or cosmologists or astronomers)
DFM wrote
emcee
.Highly experienced in physics scenarios, highly opinionated, but not statisticians (although they interact with statisticians).
MCMC is good for sampling, but not optimizing. If you want to find the modes of distributions, use an optimizer instead.
For MCMC, you only have to calculate ratios of pdfs (as seen from the algorithm).
don’t need analytic normalized pdfs great for sampling posterior pdfs
Getting
is really difficult because you need global information. (Cf. and partition function in statistical mechanics.)MCMC is extremely easy to implement, without requiring derivatives or integrals of the function (but see later discussion of HMC).
Success means a histogram of the samples looks like the pdf.
Sampling for expectation values works even though we don’t know
; we just need the set of .
Nuisance parameters are very easy to marginalize over: just drop that column in every
.Autocorrelation is important to monitor and one can tune (e.g., Metropolis step size) to minimize it. More on this later.
How do you know when to stop? Heuristics and diagnostics to come!
Practical advice for initialization and burn-in is given by Hogg and Foreman-Mackey.
MCMC random walk and sampling example#
Look at the notebook Exercise: Random walk. Let’s do some of the first part together.
Part 1: Random walk in the proposal_width
.
Algorithm: always accept unless across boundary.
Class: map this problem onto the Metropolis algorithm.
What is
?
Hint
Use shift-tab-tab
to check whether the normal function takes
What is
It must be constant except for the borders
Check the answer. Change
np.random.seed(10)
tonp.random.seed()
This will mean a different pseudo-random number sequence every time you re-run.
Note the fluctuations.
Try 200 then 2000 samples.
Try changing to a uniform proposal.
Try not adding the rejection position.
Hint
Move
samples.append(current_position)
under theif
statement. Result: it doesn’t fill the full space (try different runs – trouble with edges)Decrease width to 0.2, do 2000 steps
samples are too correlated.Look at definition and correlation time for 0.2 and 2.0.
Trend in autocorrelation plot: 1 at lag
; how fast to fluctuations around 0?
Autocorrelation#
Basic idea: Given a sequence
We define
so that
In MCMC chains there will be a typical time until
Intuition for detailed balance and the MH algorithm#

Imagine the schematic histogram here is the accumulated distribution of walkers after some time.
The red line is the posterior we are sampling,
After equilibrium we look at the chain and note there are
What if the only moves accepted were those that went uphill (i.e., to higher probability density)? What would happen to
If only uphill move were accepted, then
Ok, so suppose
In terms of
How are
Now let’s put it together.
What is the ratio of
Now how do we realize a rule that satisfies the condition we just derived? We actually have a lot of freedom in doing so and the Metropolis choice is just one possibility. Let’s verify that it works. The Metropolis algorithm says
Let’s check cases and see if it works. Here is a chart:
[fill in here] |
[fill in here] |
|
[fill in here] |
[fill in here] |
Fill in the chart based on the Metropolis algorithm we are using and verify that the ratio of
1 |
||
1 |
Dividing the 2nd by the 3rd columns for the 2nd and 3rd rows each gives the correct result!
Ok, so it works. What if we have an asymmetric proposal distribution? So
Why do you think that this property is called detailed balance? Can you make an analogy with thermodynamic equilibrium for e.g. a collection of hydrogen atoms?
You answer!