Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Reliable classification of ischaemic stroke (IS) aetiological subtypes is required in research and clinical practice, but the predictive properties of these subtypes in population studies with incomplete investigations are poorly understood. AIMS: To compare the prognosis of aetiologically-classified IS subtypes and use machine learning (ML) to classify incompletely investigated IS cases. METHODS: In a 9-year follow-up of a prospective study of 512,726 Chinese adults, 22,216 incident IS cases, confirmed by clinical adjudication of medical records, were assigned subtypes using a modified Causative Classification System for Ischemic Stroke (CCS) (LAA: large artery atherosclerosis; SAO: small artery occlusion, CE: cardioaortic embolism; or undetermined aetiology) and classified by CCS as "evident", "probable", or "possible" IS cases. For incompletely investigated IS cases where CCS yielded an undetermined aetiology, a ML model was developed to predict IS subtypes from baseline risk factors and screening for cardioaortic sources of embolism. The 5-year risks of subsequent stroke and all-cause mortality (measured using cumulative incidence functions and 1 minus Kaplan-Meier estimates, respectively) for the ML-predicted IS subtypes were compared with aetiologically-classified IS subtypes. RESULTS: Among 7,443 IS subtypes with evident or probable aetiology, 66% had SAO, 32% had LAA and 2% had CE, but proportions of SAO-to-LAA cases varied by regions in China. CE had the highest rates of subsequent stroke and mortality (43.5%, 40.7%), followed by LAA (43.2%, 17.4%) and SAO (38.1%, 11.1%), respectively. ML provided classifications for cases with undetermined aetiology and incomplete clinical data (24% of all IS cases; n=5,276), with area under the curves (AUC) of 0.99 (0.99-1.00) for CE, 0.67 (0.64-0.70) for LAA, and 0.70 (0.67-0.73) for SAO for unseen cases. ML-predicted IS subtypes yielded comparable subsequent stroke and all-cause mortality rates to the aetiologically-classified IS subtypes. CONCLUSIONS: This study highlighted substantial heterogeneity in prognosis of IS subtypes and utility of ML approaches for classification of IS cases with incomplete clinical investigations.

Original publication




Journal article


Int J Stroke

Publication Date



Aetiology, China, Classification, Ischaemic stroke, Machine learning, Prevention