首页 | 官方网站   微博 | 高级检索  
     

基于机器学习的重症监护室超长入住时长预测
引用本文:吴静依,林瑜,蔺轲,胡永华,孔桂兰.基于机器学习的重症监护室超长入住时长预测[J].北京大学学报(医学版),2021,53(6):1163-1170.
作者姓名:吴静依  林瑜  蔺轲  胡永华  孔桂兰
作者单位:1.北京大学公共卫生学院流行病与卫生统计学系,北京 100191
2.浙江省北大信息技术高等研究院,杭州 311200
3.北京大学医学信息学中心,北京 100191
4.北京大学健康医疗大数据国家研究院,北京 100191
基金项目:国家自然科学基金(81771938);国家自然科学基金(91846101);北京市自然科学基金(7212201);科技创新2030项目(2018AAA0102100);北京大学医学部-密歇根大学医学院转化医学与临床研究联合研究所项目(BMU2020JI011)
摘    要:目的:基于三种机器学习算法——支持向量机(support vector machine,SVM)、分类回归树(classification and regression tree,CART)和随机森林(random forest,RF),构建重症监护室(intensive care unit,ICU)患者的ICU入住时长(length of ICU stay,LOS-ICU)分类预测模型,并与传统的定制版简化急性生理功能评分Ⅱ(simplified acute physiology score Ⅱ,SAPS-Ⅱ)模型进行比较。方法:使用美国大型重症医疗数据库(medical information mart for intensive care Ⅲ,MIMIC-Ⅲ),以ICU患者是否发生超长LOS-ICU(prolonged LOS-ICU,pLOS-ICU)作为结局指标,构建定制版SAPS-Ⅱ、SVM、CART和RF模型,使用递归特征消除法进行特征选择,基于五折交叉验证找出最佳预测模型。模型的预测性能评价指标包括Brier评分、受试者工作特征(receiver operation characteristic,ROC)曲线下面积(area under the ROC curve,AUROC)和估计校准度指数(estimated calibration index,ECI),模型性能指标之间的比较使用双侧t检验。使用本研究中预测性能最好的模型识别出来的各预测变量重要性排序结果,给出重要性排序前五位的预测变量。结果:最终共纳入40 200例ICU患者,发生pLOS-ICU的患者23.7%。其中,男性患者57.6%,患者平均年龄为(61.9±16.5)岁。五折交叉验证结果显示,相比于定制版SAPS-Ⅱ模型,三种机器学习模型的预测性能在各个指标上均有明显提升,且差异均具有统计学意义(P<0.01)。其中,RF模型在综合预测性能、区分度与校准度三个方面均表现最优,其Brier评分、AUROC和ECI分别为0.145、0.770和7.259。校准曲线结果显示,在高pLOS-ICU发生风险的ICU人群中,RF模型倾向于略微高估其风险;在低pLOS-ICU发生风险的ICU人群中,RF模型倾向于略微低估其风险。基于性能最优的RF模型识别的对pLOS-ICU预测最重要的五个变量依次为年龄、心率、收缩压、体温和动脉血氧分压与吸入氧分数之比。结论:基于机器学习方法构建ICU患者的pLOS-ICU预测模型相比于传统的定制版SAPS-Ⅱ模型,预测性能均有明显提升,其中,基于RF方法的pLOS-ICU预测模型性能最优,具有很大的临床应用潜力。

关 键 词:重症监护室  住院时长  机器学习  随机森林  简化急性生理功能评分  
收稿时间:2019-11-26

Predicting prolonged length of intensive care unit stay via machine learning
WU Jing-yi,LIN Yu,LIN Ke,HU Yong-hua,KONG Gui-lan.Predicting prolonged length of intensive care unit stay via machine learning[J].Journal of Peking University:Health Sciences,2021,53(6):1163-1170.
Authors:WU Jing-yi  LIN Yu  LIN Ke  HU Yong-hua  KONG Gui-lan
Affiliation:1. Department of Epidemiology and Biostatistics, Peking University School of Public Health, Beijing 100191, China
2. Advanced Institute of Information Technology, Peking University, Hangzhou 311200, China
3. Peking University Medical Informatics Center, Beijing 100191, China
4. National Institute of Health Data Science, Peking University, Beijing 100191, China
Abstract:Objective: To construct length of intensive care unit (ICU) stay (LOS-ICU) prediction models for ICU patients, based on three machine learning models: support vector machine (SVM), classification and regression tree (CART), and random forest (RF), and to compare the prediction perfor-mance of the three machine learning models with the customized simplified acute physiology score Ⅱ(SAPS-Ⅱ) model. Methods: We used medical information mart for intensive care (MIMIC)-Ⅲ database for model development and validation. The primary outcome was prolonged LOS-ICU(pLOS-ICU), defined as longer than the third quartile of patients’ LOS-ICU in the studied dataset. The recursive feature elimination method was used to do feature selection for three machine learning models. We utilized 5-fold cross validation to evaluate model prediction performance. The Brier value, area under the receiver operation characteristic curve (AUROC), and estimated calibration index (ECI) were used as perfor-mance measures. Performances of the four models were compared, and performance differences between the models were assessed using two-sided t test. The model with the best prediction performance was employed to generate variable importance ranking, and the identified top five important predictors were pre-sented. Results: The final cohort in our study consisted of 40 200 eligible ICU patients, of whom 23.7% were with pLOS-ICU. The proportion of the male patients was 57.6%, and the age of all the ICU patients was (61.9±16.5) years. Results showed that the three machine learning models outperformed the customized SAPS-Ⅱ model in terms of all the performance measures with statistical significance (P<0.01). Among the three machine learning models, the RF model achieved the best overall performance (Brier value, 0.145), discrimination (AUROC, 0.770) and calibration (ECI, 7.259). The calibration curve showed that the RF model slightly overestimated the risk of pLOS-ICU in high-risk ICU patients, but underestimated the risk of pLOS-ICU in low-risk ICU patients. Top five important predictors for pLOS-ICU identified by the RF model included age, heart rate, systolic blood pressure, body tempe-rature, and ratio of arterial oxygen tension to the fraction of inspired oxygen(PaO2/FiO2). Conclusion: The RF algorithm-based pLOS-ICU prediction model had a best prediction performance in this study. It lays a foundation for future application of the RF-based pLOS-ICU prediction model in ICU clinical practice.
Keywords:Intensive care units  Length of stay  Machine learning  Random forest  Simplified acute physiology score  
本文献已被 万方数据 等数据库收录!
点击此处可从《北京大学学报(医学版)》浏览原始摘要信息
点击此处可从《北京大学学报(医学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号