The Watchmaker's Guide to Population Genetics
The Workshop
The Watchmaker’s Philosophy
Why Build It Yourself?
The Watchmaker’s Way
The Gears of Understanding
On Mathematical Rigor
On Teaching Probability and Calculus
On Python Implementations
Your Journey
The Workbench (Prerequisites)
The Workbench (Prerequisites)
Likelihood-Based Probabilistic Inference
Why Likelihood?
The Likelihood Function
The Toolkit: Key Distributions
The Exponential Distribution: Coalescence Waiting Times
The Poisson Distribution: Mutations and the SFS
The Gamma Distribution: Ages and Rates
The Gaussian Distribution: Smoothness Priors
Maximum Likelihood Estimation (MLE)
Worked Example: Inferring Population Size from the SFS
Fisher Information and Confidence Intervals
Bayesian Inference
Conjugate Priors: When Bayesian Inference Has Closed-Form Solutions
Composite and Approximate Likelihoods
Worked Example: Composite Likelihood from Two Data Sources
The Other Paradigm: Neural Networks and Amortized Inference
The key idea
What amortized inference does well
What likelihood-based inference does well
Why this book focuses on the likelihood approach
Summary
Coalescent Theory
The Big Idea
The Wright-Fisher Model (Forward in Time)
Going Backwards: The Coalescent
The probability that two specific lineages coalesce in a given generation
Waiting time to coalescence
The Coalescent with
\(n\)
Samples
Expected Number of Lineages at Time
\(t\)
Mutations on the Coalescent Tree
Summary
Ancestral Recombination Graphs
Why Trees Aren’t Enough
What Is Recombination?
What We’ve Established So Far
Recombination in the Coalescent
The Structure of an ARG: A Directed Acyclic Graph
Marginal Trees
The Tree Sequence Representation
Branch Lengths and the ARG
Why ARG Inference Is Hard
Summary
Hidden Markov Models
Why HMMs for ARG Inference?
A Warm-Up Example: Weather and Umbrellas
The Core Idea
Formal Definition
The Forward Algorithm
Scaling for Numerical Stability
Stochastic Traceback (Sampling)
The Li-Stephens Trick: Linear-Time Transitions
The Li-Stephens Transition Structure
The
\(O(K)\)
Forward Step
Summary
The Sequentially Markov Coalescent
The Problem with the Full Coalescent
What Does “Markov” Mean, and Why Does It Matter?
Intuitive Explanation
Formal Definition
Why Markov Matters for Computation
What Makes CwR Non-Markov?
The Mechanism
What Are Ghost Lineages? A Concrete Example
The SMC Approximation
Why Does This Restore the Markov Property?
How Good Is the Approximation?
The SMC Transition Probability
Deriving
\(r_i\)
: The Recombination Probability
Deriving
\(q_j\)
: The Re-joining Weights
PSMC: The Pairwise Case
The Cumulative Distribution Function
Why SMC Enables HMM Inference
Summary
The Diffusion Approximation
The Big Idea
From Wright-Fisher to Continuous Frequency
Mean and variance of
\(\Delta x\)
The diffusion timescale
Code: WF trajectories converging to SDE paths
Stochastic Differential Equations
Euler-Maruyama simulation
From SDEs to PDEs: The Fokker-Planck Equation
The two terms: diffusion and advection
Boundary Conditions
Absorbing boundaries
Why
\(x(1-x)\)
vanishes at boundaries
The flux condition
Reflecting boundaries and mutation
Stationary Distributions
The neutral case
With mutation: the Beta distribution
With selection: exponential tilting
Numerical Solutions: Finite Differences for PDEs
Discretizing
\(x\)
on a grid
Finite-difference approximations
The method of lines
Crank-Nicolson time stepping
The curse of dimensionality
Code: 1D diffusion solver
Connection to the Site Frequency Spectrum
The binomial bridge
How dadi and moments differ
Summary
Ordinary Differential Equations
The Big Idea
What Is an ODE?
Euler’s Method
The Runge-Kutta Family
RK2: The Midpoint Method
RK4: The Classic Method
RK45: Adaptive Step Size (Dormand-Prince)
Systems of Coupled ODEs
Stiffness and Implicit Methods
The Matrix Exponential
Summary
Markov Chain Monte Carlo
The Big Idea: Why Sample?
Bayesian Inference in 60 Seconds
Markov Chains
Stationary Distribution
The Metropolis-Hastings Algorithm
Gibbs Sampling
Convergence Diagnostics
Practical Considerations
Proposal Tuning
Data-Informed Proposals
Parallel Tempering
When MCMC Is Not Enough
MCMC in Population Genetics: Three Applications
ARGweaver: Gibbs Sampling over ARGs
SINGER: MH with Data-Informed Proposals
PHLASH: Beyond MCMC
Summary
Timepieces
Timepieces
Verification Status
Timepiece I: PSMC
The Mechanism at a Glance
Why Just Two Sequences?
Chapters
Overview of PSMC
The Continuous-Time PSMC Model
Discretizing Time
The PSMC HMM and EM Algorithm
Decoding the Clock
Demo: Running PSMC on Simulated Data
Timepiece II: SMC++
The Mechanism at a Glance
Chapters
Overview of SMC++
The Distinguished Lineage
The ODE System
The Continuous HMM
Population Splits
Demo: Running SMC++ on Simulated Data
Timepiece III: The Li & Stephens HMM
The Mechanism at a Glance
Chapters
Overview of the Li & Stephens HMM
The Copying Model
Haploid LS HMM Algorithms
The Diploid Extension
Demo: Running the Li & Stephens HMM on Simulated Data
Timepiece IV: msprime
The Mechanism at a Glance
Chapters
Overview of msprime
The Coalescent Process
Segments & the Fenwick Tree
Hudson’s Algorithm
Demographics & Population
Mutations
Demo: Running msprime on Simulated Data
Timepiece V: ARGweaver
The Mechanism at a Glance
Chapters
Overview of ARGweaver
Time Discretization
Transition Probabilities
Emission Probabilities
MCMC Sampling
Demo: Running ARGweaver on Simulated Data
Timepiece VI: tsinfer
The Mechanism at a Glance
Chapters
Overview of tsinfer
Gear 1: Ancestor Generation
Gear 2: The Copying Model
Gear 3: Ancestor Matching
Gear 4: Sample Matching & Post-Processing
Demo: Running tsinfer on Simulated Data
Timepiece VII: SINGER
The Mechanism at a Glance
Chapters
Overview of SINGER
Branch Sampling
Time Sampling
ARG Rescaling
Sub-Graph Pruning and Re-grafting (SGPR)
Demo: Running SINGER on Simulated Data
Timepiece VIII: Threads
The Mechanism at a Glance
Chapters
Overview of Threads
Haplotype Matching with the PBWT
Memory-Efficient Viterbi Inference
Dating Path Segments
Demo: Running Threads on Simulated Data
Timepiece IX: tsdate
The Mechanism at a Glance
Where tsinfer Ends and tsdate Begins
Chapters
Overview of tsdate
The Coalescent Prior
The Mutation Likelihood
Inside-Outside Belief Propagation
Variational Gamma (Expectation Propagation)
Rescaling
Demo: Running tsdate on Simulated Data
Timepiece X: moments
The Mechanism at a Glance
Chapters
Overview of moments
The Site Frequency Spectrum
The Moment Equations
Demographic Inference
Linkage Disequilibrium
Demo: Running moments on Simulated Data
Timepiece XI: dadi
The Mechanism at a Glance
dadi vs. moments
Chapters
Overview of dadi
The Diffusion Equation
Numerical Integration
Demographic Inference
Demo: Running dadi on Simulated Data
Timepiece XII: momi2
The Mechanism at a Glance
Chapters
Overview of momi2
The Coalescent SFS
The Moran Model
Tensor Machinery
Automatic Differentiation & Inference
Demo: Running momi2 on Simulated Data
Timepiece XIII: Gamma-SMC
The Mechanism at a Glance
PSMC vs. Gamma-SMC
Chapters
Overview of Gamma-SMC
The Gamma Approximation
The Flow Field
The Forward-Backward CS-HMM
Segmentation and Caching
Demo: Running Gamma-SMC on Simulated Data
Timepiece XIV: PHLASH
The Mechanism at a Glance
Chapters
Overview of PHLASH
The Composite Likelihood
Random Time Discretization
The Score Function Algorithm
Stein Variational Gradient Descent (SVGD)
Demo: Running PHLASH on Simulated Data
Timepiece XV: CLUES
The Mechanism at a Glance
Why Detect Selection?
Chapters
Overview: Detecting Selection
The Wright-Fisher HMM
Emission Probabilities
Inference: From Gene Trees to Selection
Demo: Running CLUES on Simulated Data
Timepiece XVI: SLiM
The Mechanism at a Glance
Chapters
Overview of SLiM
The Wright-Fisher Generation Cycle
Recipes
Demo: Running SLiM on Simulated Data
Timepiece XVII: Relate
The Mechanism at a Glance
Where tsinfer and SINGER End and Relate Begins
Chapters
Overview of Relate
Gear 1: Asymmetric Painting
Gear 2: Tree Building
Gear 3: Branch Length Estimation (MCMC)
Gear 4: Population Size Estimation
Demo: Running Relate on Simulated Data
Timepiece XVIII: discoal
The Mechanism at a Glance
Chapters
Overview of discoal
The Allele Frequency Trajectory
The Structured Coalescent Under Selection
Hard, Soft, and Partial Sweeps
discoal and msprime: Two Takes on Sweeps
Demo: Running discoal on Simulated Data
The Watchmaker's Guide to Population Genetics
Index
Index