Joe Root’s test average is 50.2. Younis Khan’s was 52.1. On most cricket stats sites, that’s the end of the comparison: Khan edges it. But Root spent the majority of his career in the 2015–2024 window — arguably the hardest era for batting in the Cricsheet data. Khan’s peak years, 2006–2012, were substantially kinder. Adjust for that, and the picture reverses.
This is the era problem. Raw averages accumulate across very different playing conditions, opponent qualities, and pitch preparation standards. A 50 in 2022 is not the same as a 50 in 2007. The difference isn’t huge — we’re not talking about the Bradman-era gap — but across the batters we care most about, it’s enough to scramble the rankings.
What the model adjusts for
Our Bayesian model estimates each batter’s latent ability — the underlying scoring rate we’d expect if every innings were played in identical conditions. To do this, we estimate three correction factors from the data:
Era correction. We treat each calendar year as a random effect in the model, estimating how easy or hard it was to score runs across all batters in that year. Harder years depress averages; easier years inflate them. A batter who played entirely in hard years gets their average scaled up.
Opposition strength. Facing the top-ranked attack at full strength is different from facing a depleted fifth-ranked team. We weight innings by a rolling opposition rating, derived from all-innings data, to avoid rewarding batters who feasted on weak attacks.
Venue adjustment. Home pitch preparation is real and measurable. Our venue terms account for the fact that Kohli’s home average in India is partly a reflection of Indian pitch preparation rather than purely Kohli’s skill.
The chart above shows a clear negative trend: test cricket got materially harder between 2008 and 2021. Faster pitches, more reverse swing, more extreme seam movement in English conditions, more aggressive pace attacks in Australia and South Africa. Whether this is temporary or structural is one of our open questions — but the data is consistent.
What changes — and what doesn’t
The adjustments are moderate, not revolutionary. We’re not claiming Root is actually averaging 65. What we are saying is that relative rankings are distorted by era in ways that are systematic and correctable.
The biggest movers are batters who played deep into the harder post-2018 era: Root (+4.5 pts), Williamson (+3.8 pts), and Labuschagne (+5.1 pts on a smaller sample). Batters whose peak came in the gentler 2006–2012 window are adjusted down slightly: Ponting (−1.2 pts), Kallis (−0.9 pts).
Babar Azam is the most interesting case in the current era. His raw average of 45.2 already places him in good company. But our model adjusts downward — he’s played a disproportionate share of home innings against weaker oppositions, and the venue correction bites. We’d need another 40–50 away innings to tighten his CI enough to be confident of his rank among the game’s elite.
What we’re honest about
Bayesian models produce calibrated uncertainty, not truth. Our posteriors are conditional on the model structure being roughly right — and there are real structural choices we’ve made that reasonable people could argue with: which year to treat as baseline, how to model pitch conditions without explicit pitch ratings, whether to pool across formats.
We flag three limitations explicitly: the data starts in 2001, so pre-Cricsheet careers are excluded or truncated. The opposition strength estimates are themselves uncertain — we’re propagating that uncertainty imperfectly. And for batters with fewer than 80 innings, the CI is wide enough that rankings are essentially noise.
Methodology — how the model works
Model structure: We fit a hierarchical Bayesian model where each batter’s run scoring in each innings is modelled as a geometric distribution (the natural model for batting averages under constant dismissal probability). The latent dismissal probability for each batter–year–venue–opposition combination is the quantity of interest.
Priors: Weakly informative priors centred on the overall league average. Era terms use a random-walk prior that encodes the expectation that adjacent years are similar. Venue terms use a pooled prior across grounds in the same country.
Inference: MCMC via Stan, 4 chains × 2000 iterations. R-hat < 1.01 for all reported parameters. Credible intervals are HDI (highest density interval), not equal-tailed.
Data: Cricsheet ball-by-ball data, 2001–present. Test matches only. DNB innings excluded from denominator. Retired-hurt treated as not-out. We update the model monthly.