A node-level reinforcement-learning approach to vaccine allocation on heterogeneous contact networks — bridging optimal control and deep RL under a hard daily dose budget.
In the first weeks of a pandemic, who should get the vaccine today? This demo walks through the thesis — the population model, the disease dynamics, the contact network — and lets you run a stochastic outbreak with your own seed configuration.
Every individual in the network is assigned to exactly one of three demographic groups: a hub set defined by a degree threshold, a high-risk elderly subpopulation drawn by Bernoulli sampling on the remainder, and the baseline majority. Each group has its own symptomatic, hospitalisation, and case-fatality rates.
(Demo profile — slightly more virulent than the paper's COVID-calibrated values to make outbreak dynamics visible at N=200–1500 scale.)
The low-risk majority — working-age adults with typical contact patterns. Most of the population belongs here by default.
Elevated disease severity across every branching probability. Moderate contact patterns but markedly worse clinical outcomes once infected.
Delivery workers, transit drivers, frontline retail — the hubs of the contact graph. Few in number but wired into a disproportionate share of all transmission pathways.
A COVID-style disease model with distinct latent, pre-symptomatic, asymptomatic, symptomatic, late-stage, hospitalised, recovered, vaccinated, and dead compartments. Three of them — P, A, I — drive onward transmission. Click a state to inspect it.
Explore the 10 compartments of the model. States in red are infectious. The three infectious weights wP=0.8, wA=0.5, wI=1.0 modulate how much each one contributes to the per-node force of infection.
Latent 3d · Pre-symptomatic 2d · Asymptomatic / Late-stage 5d · Hospitalised 10d
Real contact graphs are scale-free: most people have a few connections, a few people have many. The BA model reproduces this with a single rule — each new node attaches with probability proportional to existing degree, P(v → u) = ku / Σ kw. Hit play and watch the hubs emerge.
After growth completes, nodes are coloured by group:
Pick your seed configuration — how many initial infections and which groups they land in — then press play. This is the no-vaccination baseline: what happens if nothing is done. The final death count is the upper bound that every allocation method must beat.
All four methods evaluated on Neval = 30 stochastic rollouts of the paper's network-level simulator. The default configuration is N = 5 000, horizon T = 60 days, daily budget K = 10, and 300 initial infections split 50:17:33 across X:Y:Z. RL methods train on 1 500 episodes of PPO; reported numbers are best-of-three training seeds.
Sweeping population size, horizon length, and initial infection ratio. The shaded band is the gap between Node RL and OC-high in each row: when it dips below zero, Node RL wins outright.
The thesis's main practical deliverable: a decision rule mapping deployment conditions to the preferable method, with the measured gap and relative preparation cost.
| Regime | Representative point | Gap | Cost ratio | Recommended |
|---|---|---|---|---|
| Default deployment | N = 5k, T = 60, init = 300 | +0.4 | ~70× | OC-high (tied) |
| Large population, ratios held | N = 10–20k, fixed % | +0.8 → +3.8 | ~70–120× | OC-high |
| Long horizon | T = 70–90 | +1.0 → +1.9 | ~70× | OC-high |
| Short horizon | T = 30 | −1.1 | ~180× | Node RL |
| Overloaded initial state | init = 1 200 on N = 5k | −1.0 | ~70× | Node RL |
| Intra-group equity priority | Any regime where Δheur < 0 | n/a | n/a | OC-Random or Node RL |
The takeaway that comes out of the regime map, ablations, and runtime analysis together.
On moderate horizons and realistic initial-infection ratios, OC-high matches or beats Node RL at ~70× less preparation cost. If you had to pick one method without regime information, it's this.
Short horizons (T ≲ 30) where stochastic first-wave dynamics dominate, or overloaded starts where too many hubs are already infected at t = 0 for the degree heuristic to bite.
High-degree-first contributes median +1.3 deaths — the same order as the group-level OC itself. Reports that say "OC on a network" without disclosing the post-hoc rule are conflating two separable contributions.
Node RL's Top-K head beats Naive RL's Bernoulli head by +1.9 deaths with every other component held identical. Without the O(K) reparametrisation, individual-level RL doesn't scale to this problem at all.