Automated extraction of post-stroke functional outcomes from unstructured electronic health records

Scritto il 22/01/2025
da Marta Fernandes

Eur Stroke J. 2025 Jan 22:23969873251314340. doi: 10.1177/23969873251314340. Online ahead of print.

ABSTRACT

PURPOSE: Population level tracking of post-stroke functional outcomes is critical to guide interventions that reduce the burden of stroke-related disability. However, functional outcomes are often missing or documented in unstructured notes. We developed a natural language processing (NLP) model that reads electronic health records (EHR) notes to automatically determine the modified Rankin Scale (mRS).

METHOD: We included consecutive patients (⩾18 years) with acute stroke admitted to our center (2015-2024). mRS scores were obtained from the Get With the Guidelines registry and clinical notes (if documented), and used as the gold standard to compare against NLP-generated scores. We used text-based features from notes, along with age, sex, discharge status, and outpatient follow-up to train a logistic regression for prediction of good (0-2) versus poor (3-6) mRS, and a linear regression for the full range of mRS scores. The models were trained for prediction of mRS at hospital discharge and post-discharge. The models were externally validated in a dataset of patients with brain injuries from a different healthcare center.

FINDINGS: We included 5307 patients, 5006 in train and test and 301 in validation; average age was 69 (SD 15) and 65 (SD 17) years, respectively; 47% female. The logistic regression achieved an area under the receiver operating curve (AUROC) of 0.94 [CI 0.93-0.95] (test) and 0.94 [0.91-0.96] (validation), and the linear model a root mean squared error (RMSE) of 0.91 [0.87-0.94] (test) and 1.17 [1.06-1.28] (validation).

DISCUSSION AND CONCLUSION: The NLP-based model is suitable for use in large-scale phenotyping of stroke functional outcomes and population health research.

PMID:39838914 | DOI:10.1177/23969873251314340