A mechanistic evolutionary model explains the time-dependent pattern of substitution rates in viruses.
Ghafari M., Simmonds P., Pybus OG., Katzourakis A.
Estimating viral timescales is fundamental in understanding the evolutionary biology of viruses. Molecular clocks are widely used to reveal the recent evolutionary histories of viruses but may severely underestimate their longer-term origins because of the inverse correlation between inferred rates of evolution and the timescale of their measurement. Here, we provide a predictive mechanistic model that readily explains the rate decay phenomenon over a wide range of timescales and recapitulates the ubiquitous power-law rate decay with a slope of -0.65. We show that standard substitution models fail to correctly estimate divergence times once the most rapidly evolving sites saturate, typically after hundreds of years in RNA viruses and thousands of years in DNA viruses. Our model successfully recreates the observed pattern of decay and explains the evolutionary processes behind the time-dependent rate phenomenon. We then apply our model to re-estimate the date of diversification of genotypes of hepatitis C virus to 423,000 (95% highest posterior density [HPD]: 394,000-454,000) years before present, a time preceding the dispersal of modern humans out of Africa, and show that the most recent common ancestor of sarbecoviruses dates back to 21,000 (95% HPD: 19,000-22,000) years ago, nearly thirty times older than previous estimates. This creates a new perspective for our understanding of the origins of these viruses and also suggests that a substantial revision of evolutionary timescales of other viruses can be similarly achieved.