Master's Thesis · University of Pennsylvania · 2026

Scalable Node-Level Vaccine Allocation
on Contact Networks

In the first weeks of a pandemic, who should get the vaccine today? This demo walks through the thesis — the population model, the disease dynamics, the contact network — and lets you run a stochastic outbreak with your own seed configuration.

Junfei Zhan · Advisor: Prof. Saswati Sarkar · 10-state SEPAILHRVD model · Barabási–Albert scale-free contact network · Three-group risk stratification

Part 1

Three population groups

Every individual in the network is assigned to exactly one of three demographic groups: a hub set defined by a degree threshold, a high-risk elderly subpopulation drawn by Bernoulli sampling on the remainder, and the baseline majority. Each group has its own symptomatic, hospitalisation, and case-fatality rates.
(Demo profile — slightly more virulent than the paper's COVID-calibrated values to make outbreak dynamics visible at N=200–1500 scale.)

Group X — Baseline

The general population

The low-risk majority — working-age adults with typical contact patterns. Most of the population belongs here by default.

Symptomatic

s_X = 0.50

Hosp.

p_X = 0.12

Fatality

d_X = 0.10

Prev.

~80%

Compound case-fatality ≈ 0.6%

Group Y — High-risk

Elderly (65+)

Elevated disease severity across every branching probability. Moderate contact patterns but markedly worse clinical outcomes once infected.

Symptomatic

s_Y = 0.85

Hosp.

p_Y = 0.50

Fatality

d_Y = 0.65

Prev.

~17%

Compound case-fatality ≈ 27.6% — 45× Group X

Group Z — Hub

High-contact individuals

Delivery workers, transit drivers, frontline retail — the hubs of the contact graph. Few in number but wired into a disproportionate share of all transmission pathways.

Symptomatic

s_Z = 0.60

Hosp.

p_Z = 0.22

Fatality

d_Z = 0.20

Degree

≥ µ+ασ

Structural role: drives outbreak velocity via rich-get-richer topology · compound CFR ≈ 2.6%

Part 2

The 10-compartment SEPAILHRVD model

A COVID-style disease model with distinct latent, pre-symptomatic, asymptomatic, symptomatic, late-stage, hospitalised, recovered, vaccinated, and dead compartments. Three of them — P, A, I — drive onward transmission. Click a state to inspect it.

Click any state

Explore the 10 compartments of the model. States in red are infectious. The three infectious weights w_P=0.8, w_A=0.5, w_I=1.0 modulate how much each one contributes to the per-node force of infection.

Latent 3d · Pre-symptomatic 2d · Asymptomatic / Late-stage 5d · Hospitalised 10d

S · Susceptible

E · Exposed

P · Pre-symptomatic

A · Asymptomatic

I · Symptomatic

L · Late-stage

H · Hospitalised

R · Recovered

V · Vaccinated

D · Dead

Part 3

Building a Barabási–Albert contact network

Real contact graphs are scale-free: most people have a few connections, a few people have many. The BA model reproduces this with a single rule — each new node attaches with probability proportional to existing degree, P(v → u) = k_u / Σ k_w. Hit play and watch the hubs emerge.

Nodes: 0 / 500

Degree distribution

Target N 500

Edges / node (m) 2

Speed 6×

After growth completes, nodes are coloured by group:

X

Y

Z hub

Part 4

Run a stochastic outbreak

Pick your seed configuration — how many initial infections and which groups they land in — then press play. This is the no-vaccination baseline: what happens if nothing is done. The final death count is the upper bound that every allocation method must beat.

Network

N 600

m 2

Initial seed

Total 30

Share across groups (auto-normalised to 100%):

X

50%

—

Y

25%

—

Z

25%

—

sum · 0 seeds of 0

Simulation

Horizon T 90d

Speed 6×

Day 0 / 90 · Infected: 0 · Deaths: 0

Live counts

Active infections

0

Cumulative deaths

0

Trajectory

Part 5 · Final Results

Four allocation methods, head-to-head

All four methods evaluated on N_eval = 30 stochastic rollouts of the paper's network-level simulator. The default configuration is N = 5 000, horizon T = 60 days, daily budget K = 10, and 300 initial infections split 50:17:33 across X:Y:Z. RL methods train on 1 500 episodes of PPO; reported numbers are best-of-three training seeds.

RANK1

Optimal Control · Proposed

OC-high

22.0± 4.6

Mean deaths over 30 rollouts

Group-level OC solved on the ODE, then intra-group doses given to the highest-degree susceptibles. Wins the default config and most practical regimes.

9 s offline 153 ms / ep

RANK2

Reinforcement Learning · Proposed

Node RL

22.4± 3.4

Mean deaths over 30 rollouts

Shared MLP scorer with Gumbel-Top-K action head — O(K) policy-gradient variance, independent of N. Beats OC-high at short horizons or hub-overloaded starts.

625 s offline 75 ms / ep

RANK3

Optimal Control · Baseline

OC-Random

25.2± 4.1

Mean deaths over 30 rollouts

Same group-level OC rates as OC-high, but doses go to uniformly random nodes within each group. Isolates the value of the degree heuristic.

9 s offline

RANK4

Reinforcement Learning · Baseline

Naive RL

26.6± 4.2

Mean deaths over 30 rollouts

Same scorer, same PPO loop — but independent Bernoulli action head. Θ(N) gradient variance drowns out the K informative decisions per day.

792 s offline 75 ms / ep

Δ_heur — intra-group heuristic

+1.3deaths (median)

Positive in 11 / 12 regime points, ≈ 5–10% of the OC-high reference. Reporting "OC" on a network without specifying the intra-group selection rule conflates two separable components. Counter-productive only at init = 1 200 on N = 5 000 (overloaded start).

Δ_arch — Top-K action space

+1.9deaths (median)

Positive in 11 / 12 regime points. Node RL and Naive RL share everything else — scorer, PPO loop, hyperparameters — so the gap is entirely attributable to the action-head reparametrisation. Empirically confirms the O(K) vs Θ(N) gradient-variance bound.

Regime analysis · three axes

Sweeping population size, horizon length, and initial infection ratio. The shaded band is the gap between Node RL and OC-high in each row: when it dips below zero, Node RL wins outright.

Population size

Ratios held fixed (K = 0.2% N, init = 6% N, T = 60)

OC-high Node RL

OC-high wins at every N. The gap grows from +0.4 (N=5k) to +3.8 (N=20k) — as the susceptible pool grows, the ODE mean-field becomes more accurate, not less.

Horizon length

Fixed N = 10 000, V_max = 20, init = 600

OC-high Node RL

Node RL wins at T = 30 (19.0 vs 20.1, gap −1.1). Short horizons are dominated by stochastic first-wave dynamics that the pre-computed OC plan can't react to.

Initial infections

Fixed N = 5 000, V_max = 10, T = 60, 50:17:33 split

OC-high Node RL

Node RL wins at init = 1 200 (27.6 vs 28.6, gap −1.0). A large fraction of hubs is already infected at t = 0, so high-degree targeting wastes doses on nodes behind an active wavefront.

Regime map · which method to use

The thesis's main practical deliverable: a decision rule mapping deployment conditions to the preferable method, with the measured gap and relative preparation cost.

Regime	Representative point	Gap	Cost ratio	Recommended
Default deployment	N = 5k, T = 60, init = 300	+0.4	~70×	OC-high (tied)
Large population, ratios held	N = 10–20k, fixed %	+0.8 → +3.8	~70–120×	OC-high
Long horizon	T = 70–90	+1.0 → +1.9	~70×	OC-high
Short horizon	T = 30	−1.1	~180×	Node RL
Overloaded initial state	init = 1 200 on N = 5k	−1.0	~70×	Node RL
Intra-group equity priority	Any regime where Δ_heur < 0	n/a	n/a	OC-Random or Node RL

Four practical rules

The takeaway that comes out of the regime map, ablations, and runtime analysis together.

OC-high is the default

On moderate horizons and realistic initial-infection ratios, OC-high matches or beats Node RL at ~70× less preparation cost. If you had to pick one method without regime information, it's this.

Node RL earns its compute when the ODE breaks down

Short horizons (T ≲ 30) where stochastic first-wave dynamics dominate, or overloaded starts where too many hubs are already infected at t = 0 for the degree heuristic to bite.

The intra-group heuristic is not a free add-on

High-degree-first contributes median +1.3 deaths — the same order as the group-level OC itself. Reports that say "OC on a network" without disclosing the post-hoc rule are conflating two separable contributions.

Action-space design decides trainability

Node RL's Top-K head beats Naive RL's Bernoulli head by +1.9 deaths with every other component held identical. Without the O(K) reparametrisation, individual-level RL doesn't scale to this problem at all.

Who gets the vaccine today?

Scalable Node-Level Vaccine Allocationon Contact Networks

Three population groups

The general population

Elderly (65+)

High-contact individuals

The 10-compartment SEPAILHRVD model

Click any state

Building a Barabási–Albert contact network

Degree distribution

Run a stochastic outbreak

Network

Initial seed

Simulation

Live counts

Trajectory

Four allocation methods, head-to-head

OC-high

Node RL

OC-Random

Naive RL

Regime analysis · three axes

Population size

Horizon length

Initial infections

Regime map · which method to use

Four practical rules

OC-high is the default

Node RL earns its compute when the ODE breaks down

The intra-group heuristic is not a free add-on

Action-space design decides trainability

Scalable Node-Level Vaccine Allocation
on Contact Networks