SARS-CoV-2 genomic diversity and within-host evolution in individuals with persistent infection in the UK: an observational, longitudinal, population-based surveillance study.
Ghafari M., Kemp SA., Hall M., Clarke J., Ferretti L., Thomson L., Studley R., COVID-19 Infection Survey Group None., COVID-19 Genomics UK (COG-UK) Consortium None., Walker AS., Golubchik T., Lythgoe K.
BACKGROUND: Persistent SARS-CoV-2 infections in hospitalised immunocompromised individuals are known to facilitate accelerated within-host viral evolution, potentially contributing to the emergence of highly divergent variants. However, little is known about the evolutionary dynamics and transmission risks of persistent infections in the general population. We aimed to characterise the within-host evolution of SARS-CoV-2 during persistent infections identified through a large community surveillance study. METHODS: We used data from the Office for National Statistics COVID-19 Infection Survey (ONS-CIS), a large-scale, longitudinal, population-based surveillance study conducted in the UK from April, 2020, to March, 2023. For this analysis, we focused on infections with high viral load (cycle threshold ≤30) and available genome sequences, from seven major SARS-CoV-2 lineages (alpha, delta, BA.1, BA.2, BA.4, BA.5, and XBB). ONS-CIS participants were randomly selected from the general population and tested regularly by RT-PCR, regardless of symptoms. We defined persistent infections as those with sustained or rebounding high viral RNA titres for 26 days or longer. We examined associated host characteristics and used raw sequence data to identify de novo mutations and estimate within-host synonymous and non-synonymous evolutionary rates across the SARS-CoV-2 genome. FINDINGS: Between Nov 2, 2020, and March 21, 2023, we identified 576 persistent infections with at least two sequences, including 11 alpha, 106 delta, 102 BA.1, 204 BA.2, 16 BA.4, 133 BA.5, and 4 XBB. Persistent infections were more common in males than females (p<0·0001) and individuals older than 60 years (p=0·0027). The median within-host genome-wide evolutionary rate was 7·9 × 10-4 substitutions per site per year (IQR 7·0-9·0 × 10-4), with high inter-individual variability driven largely by non-synonymous mutations, particularly in the N-terminal and receptor-binding domains of the spike protein. Longer infection duration was associated with higher evolutionary rates, while no associations were found with age, sex, vaccination status, previous infection, or virus lineage. We found no clear evidence of transmission beyond the first month of infection in any of the 84 persistent infections lasting 56 days or longer. In total, we identified 379 recurrent mutations, including many with known or predicted negative fitness effects and low prevalence at the population level, as well as de novo reversions to the Wuhan-Hu-1 reference sequence, which were likely under positive selection within those individuals. INTERPRETATION: This study highlights the heterogeneous nature of within-host SARS-CoV-2 evolution in individuals with persistent infection in the community. Notably, a small subset of persistent infections with high viral loads underwent accelerated viral evolution or recurrently acquired hallmark mutations found in novel variants. In addition, onward transmission from a persistent infection during the later stages of infection is likely to be rare. These insights have important implications for prioritising genomic surveillance and managing patients with persistent infections. FUNDING: Department of Health and Social Care.