GPS Pipeline: portable, scalable genomic pipeline for Streptococcus pneumoniae surveillance from Global Pneumococcal Sequencing Project.

Hung HCH., Kumar N., Dyster V., Yeats C., Metcalf B., Li Y., Hawkins PA., McGee L., Bentley SD., Lo SW.

Streptococcus pneumoniae (pneumococcus) is a major pathogen globally, responsible for an estimated one million deaths annually and contributing significantly to the global burden of antimicrobial resistance. Ongoing surveillance of its vaccine antigen (i.e. serotypes), antimicrobial resistance, and pneumococcal lineages is crucial for assessing the impact of vaccination programs and guiding future vaccine design. However, current bioinformatics tools have several limitations that prevent them from enabling comprehensive analysis that allows simultaneous, large-scale, and independent generation of these crucial data. Here, we present the GPS Pipeline that enables reliable extraction of public health information from pneumococcal genomes using in silico methods. It can accurately identify 102 of 107 known serotypes, recognise 1053 pneumococcal lineages, and predict susceptibilities to 19 common antibiotics. Built on Nextflow and utilising containerisation technology, the GPS Pipeline minimises software setup requirements and bioinformatics expertise while facilitating large-scale analysis of genomic data. The GPS Pipeline was applied and validated on 20,924 pneumococcal genomes worldwide, demonstrating its effectiveness in enhancing responsiveness in pneumococcal genomic surveillance.

DOI

10.1038/s41467-025-64018-5

Type

Journal article

Publication Date

2025-09-01T00:00:00+00:00

Volume

16

Addresses

Parasites and Microbes, Wellcome Sanger Institute, Hinxton, UK. ch31@sanger.ac.uk.

Keywords

Humans, Streptococcus pneumoniae, Pneumococcal Infections, Pneumococcal Vaccines, Anti-Bacterial Agents, Computational Biology, Genomics, Genome, Bacterial, Software, Serogroup

Permalink More information Close