使用可解释的提升机比较生态瞬时评估数据的印度和名义上的方法

论文标题

使用可解释的提升机比较生态瞬时评估数据的印度和名义上的方法

Using Explainable Boosting Machine to Compare Idiographic and Nomothetic Approaches for Ecological Momentary Assessment Data

论文作者

Ntekouli, Mandani, Spanakis, Gerasimos, Waldorp, Lourens, Roefs, Anne

论文摘要

先前关于精神障碍EMA数据的研究主要集中于分别对每个人建模的基于多元回归的方法。本文朝着探索非线性解释机器学习（ML）模型在分类问题中迈出了一步。 ML模型可以通过识别数据中变量之间的复杂模式来增强准确预测不同行为的能力。为了评估这一点，使用不平衡的合成和现实世界数据集将各种树木合奏的性能与线性模型进行了比较。在检查了所有情况下AUC得分的分布之后，非线性模型似乎优于基线线性模型。此外，除了个性化的方法外，小组级预测模型还可能提供增强的性能。据此，检查了两种不同的名义方法来整合一个以上个体的数据，其中一种在培训过程中直接使用所有数据，另一种基于知识蒸馏。有趣的是，观察到，在两个现实世界数据集之一中，知识蒸馏方法提高了AUC分数（与个性化相比，平均相对变化为+17 \％），表明它如何使EMA数据分类和性能受益。

Previous research on EMA data of mental disorders was mainly focused on multivariate regression-based approaches modeling each individual separately. This paper goes a step further towards exploring the use of non-linear interpretable machine learning (ML) models in classification problems. ML models can enhance the ability to accurately predict the occurrence of different behaviors by recognizing complicated patterns between variables in data. To evaluate this, the performance of various ensembles of trees are compared to linear models using imbalanced synthetic and real-world datasets. After examining the distributions of AUC scores in all cases, non-linear models appear to be superior to baseline linear models. Moreover, apart from personalized approaches, group-level prediction models are also likely to offer an enhanced performance. According to this, two different nomothetic approaches to integrate data of more than one individuals are examined, one using directly all data during training and one based on knowledge distillation. Interestingly, it is observed that in one of the two real-world datasets, knowledge distillation method achieves improved AUC scores (mean relative change of +17\% compared to personalized) showing how it can benefit EMA data classification and performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题