优化重症监护脓毒症的医疗：从增强学习到预审评估

论文标题

优化重症监护脓毒症的医疗：从增强学习到预审评估

Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation

论文作者

Li, Luchen, Albert-Smet, Ignacio, Faisal, Aldo A.

论文摘要

我们的目的是建立一个框架，在其中回顾性优化干预措施的加固学习（RL）使我们可以采用符合法规的途径，以便在临床部署中对学到的政策进行前瞻性临床测试。我们专注于重症监护病房中的感染，这是由于复杂而不透明的患者动态而难以治疗的主要原因之一，并且每个患者要求的临床辩论，高度发散的干预政策集，但重症监护病房自然而然地数据富含数据。在我们的工作中，我们以医疗保健（“ AI临床医生”）的RL方法为基础，并使用在部分可观察到的MDP（POMDPS）下的历史重症监护数据来学习药品的非政策持续给药政策，用于脓毒症治疗。通过获取所有历史信息，可以更好地捕获患者状态的不确定性，从而产生有效的表示，我们通过消融进行了调查。我们通过通过最佳优先搜索评估每个遇到的状态来弥补回顾性数据中缺乏探索。我们通过优化临床医生复合政策附近的政策来减轻国家分布的转变。至关重要的是，我们不仅使用常规政策评估，而且还使用一个结合了人类专家的新型框架来评估我们的模型建议：一种模型的临床前评估方法，以估算临床医生决策的准确性和不确定性与我们的系统建议在与同一个人患者历史上面对面时（“阴影模式”）。

Our aim is to establish a framework where reinforcement learning (RL) of optimizing interventions retrospectively allows us a regulatory compliant pathway to prospective clinical testing of the learned policies in a clinical deployment. We focus on infections in intensive care units which are one of the major causes of death and difficult to treat because of the complex and opaque patient dynamics, and the clinically debated, highly-divergent set of intervention policies required by each individual patient, yet intensive care units are naturally data rich. In our work, we build on RL approaches in healthcare ("AI Clinicians"), and learn off-policy continuous dosing policy of pharmaceuticals for sepsis treatment using historical intensive care data under partially observable MDPs (POMDPs). POMPDs capture uncertainty in patient state better by taking in all historical information, yielding an efficient representation, which we investigate through ablations. We compensate for the lack of exploration in our retrospective data by evaluating each encountered state with a best-first tree search. We mitigate state distributional shift by optimizing our policy in the vicinity of the clinicians' compound policy. Crucially, we evaluate our model recommendations using not only conventional policy evaluations but a novel framework that incorporates human experts: a model-agnostic pre-clinical evaluation method to estimate the accuracy and uncertainty of clinician's decisions versus our system recommendations when confronted with the same individual patient history ("shadow mode").

下载PDF全文

下载文献需遵守相关版权规定

论文标题