Transl Lung Cancer Res. 2024 Dec 31;13(12):3642-3656. doi: 10.21037/tlcr-24-982. Epub 2024 Dec 27.
ABSTRACT
BACKGROUND: Prognosis prediction is crucial for non-small cell lung cancer (NSCLC) treatment planning. While tumor hypoxia significantly impacts patient outcomes, identifying hypoxic genomic markers remains challenging. This study sought to identify hypoxic computed tomography (CT) radiomic features and create an artificial intelligence (AI) model for NSCLC through the integration of multi-modal data.
METHODS: In total, 452 NSCLC patients were enrolled in this study, including patients from The Second Affiliated Hospital of Soochow University (SC, n=112), The Cancer Genome Atlas (TCGA)-NSCLC dataset (n=74), the radiogenomics dataset (n=130), and the Gene Expression Omnibus (GEO) datasets (GSE19188: n=82, and GSE87340: n=54). Hypoxia status was classified using optimized cut-off values of hypoxia enrichment scores, which were calculated through single-sample gene set enrichment analysis (ssGSEA) of hypoxic genes. Radiomic features were extracted using three-dimensional (3D)-Slicer software. The least absolute shrinkage and selection operator (LASSO) algorithm was used to identify hypoxic CT radiomic features. A model named ssuBERT (semantic structured unit embedded in Bidirectional Encoder Representations from Transformers) was developed to analyze electronic health records (EHRs). An AI model for overall survival prediction was constructed by integrating CT radiomic features, ssuBERT features, and clinical data, and evaluated using five-fold cross-validation.
RESULTS: Higher hypoxia levels were correlated with worse survival outcomes. Twenty-eight radiomic features showed significant discriminatory power in detecting hypoxia status with an area under the curve (AUC) of 0.8295. The ssuBERT model achieved a weighted accuracy of 0.945 in recognizing semantic structured units in EHRs. The EHR model exhibited superior predictive performance among the single-modal models with an AUC of 0.7662. However, the multi-modal AI model had the highest average AUC of 0.8449 and an F1 score of 0.7557.
CONCLUSIONS: The AI model demonstrated potential in predicting NSCLC patient prognosis through multi-modal data integration, warranting further validation.
PMID:39830777 | PMC:PMC11736583 | DOI:10.21037/tlcr-24-982