Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

The BMC Infectious Diseases study is the first to investigate the integration of machine learning algorithms with primary care data to support testing for blood-borne diseases.

Machine learning algorithms (MLAs) could support early diagnosis of HIV, Hepatitis B (HBV) and Hepatitis C (HCV) in primary care settings, a study in BMC Infectious Diseases found.

In collaboration with colleagues across several UK institutions, PSI and University of Oxford researchers explored the feasibility of integrating machine learning with routine primary care data to improve testing for HIV, HCV and HBV within the general population.

“Our findings suggest that combining digital technologies via tailored machine-learning algorithms (MLAs) for routinely available primary care data has the potential to improve the efficiency, targeting, and timeliness of blood-borne viruses testing,” said Pandemic Sciences Institute’s Associate Professor Jasmina Panovska-Griffiths, lead co-author on the study.

In high-income countries, testing for blood-borne viruses (BBVs) such as HIV, HCV and HBV is generally performed in specialist settings — such as drug misuse treatment centres and sexual health and antenatal clinics – which has led to considerable progress. However, people with undiagnosed infection exist outside of these settings: for instance, 50% of chronic HBV cases in England remain undiagnosed. 

“Testing in primary care settings is largely opportunistic, highly variable and poorly targeted”, said Dr Werner Leber, Clinical Lecturer in Primary Care at Queen Mary University of London and study co-lead.

“Our recent systematic review of BBVs risk prediction methods showed the lack of MLAs relevant to primary care and across all three HIV, HCV and HBV. To fill this gap, in this study we developed a suite of machine-learning methods to estimate the likelihood that people with risk factors are likely to be living with one of the three BBVs.”

Improving testing strategies within the general population would ultimately support better patient outcomes and mitigate the public health burden associated with these infections.

The current study, funded by the NIHR School of Primary Care Research and co-produced with people living with BBVs and their representatives, is the first to explore the use of machine learning to support BBV testing in primary care.

Discussion and findings

The study’s authors used a large-scale, anonymised dataset of 1.9 million GP patients in North London to train and test three MLAs, evaluating their performance in predicting positivity to HIV, HCV and HBV or a combination thereof (‘any BBV’), and identifying risk factors associated with infection.

“We used the Balanced Random Forest Classifier as it is robust to overfitting, we fitted an AdaBoost model which addresses data imbalance and also explored Logistic Regression with balanced class weights as a simpler predictive approach”, said lead co-author Harrison Manley, UK Health Security Agency.

The researchers identified age as the most important risk factor for all infections considered. Sex and GP recorded ethnicity, drug and alcohol misuse, imprisonment, sexual behaviour, tattoo and transfusion records, associated co-morbidities, homelessness and migration status were other key factors associated with heightened risk of testing positive for BBVs.

Several predictors were shared between two infections, such as Black African ethnicity (HIV and HBV), liver disease (HBV and HCV), and opiate or cocaine use (HBV and HCV).

Among all individual infections, HCV was the most accurately predicted across all models. While no shared risk factors were identified across all three infections, the authors suggest that key predictors are largely a combination of established risk factors for individual BBV infection.

Although none of the models emerged as a clear winner in predicting individual HIV, HCV, or HBV positivity, the Logistic Regression model achieved a robust performance, whilst also offering practical advantages due to its potential to be readily implemented in other settings.

“Evaluating different MLAs and applying a broad set of accuracy criteria when utilising digital technology is however necessary for improved accuracy in real-life application of precision medicine”, added Professor Panovska-Griffiths. 

Future implications

While this study does not intend to redefine diagnostic criteria for BBVs, it aims to improve testing recommendations, identifying risk combinations that are not included in current guidelines and do not currently trigger a BBV test.

The authors see this work as the first step towards identifying additional and more specific cohorts for BBV testing in general settings, strengthening the role of primary care in identifying individuals at risk and linking them to a wider network of specialist and community-based services.

According to the authors, participating primary care practices would be able to identify patients with at least one recorded BBV risk factor in their Electronic Health Record (EHR), and use the predictive algorithms developed in this study to generate a personalised risk score. 

GP practices, the authors envisage, would then be able to prioritise the most at-risk patients for BBV blood testing. Patients with a positive test would then be referred to specialist services – including sexual health, hepatology or substance use services – for further assessment, appropriate care and support.


Read the full study in BMC Infectious Diseases.