论文标题

使用Denoing AutoCododer进行财务表格数据解释异常

Explaining Anomalies using Denoising Autoencoders for Financial Tabular Data

论文作者

Sattarov, Timur, Herurkar, Dayananda, Hees, Jörn

论文摘要

可解释的AI(XAI)的最新进展增加了对各个行业中安全和可解释的AI模型部署的需求。尽管深度神经网络在各种领域取得了最新的成功,但了解这种复杂模型的决策过程对于领域专家来说仍然是一项艰巨的任务。尤其是在金融领域,仅指向通常由数百种混合类型柱组成的异常,对专家的价值有限。因此,在本文中,我们提出了一个框架,用于解释使用用于混合类型表格数据的Denoising自动编码器。我们专门将技术集中在错误的观察异常上。这是通过将单个样本柱(单元)定位的潜在误差并分配相应的置信度得分来实现的。此外,该模型还提供了预期的单元格估计来解决误差。我们根据三个标准的公共表格数据集(信用默认,成人,IEEE欺诈)和一个专有数据集(Holdings)来评估我们的方法。我们发现,应用于此任务的Denoing自动编码器已经超过了单元格检测率和预期价值率中的其他方法。此外,我们分析了设计用于细胞误差的专业损失如何进一步改善这些指标。我们的框架是为域专家设计的,以了解异常的异常特征,并改善内部数据质量管理流程。

Recent advances in Explainable AI (XAI) increased the demand for deployment of safe and interpretable AI models in various industry sectors. Despite the latest success of deep neural networks in a variety of domains, understanding the decision-making process of such complex models still remains a challenging task for domain experts. Especially in the financial domain, merely pointing to an anomaly composed of often hundreds of mixed type columns, has limited value for experts. Hence, in this paper, we propose a framework for explaining anomalies using denoising autoencoders designed for mixed type tabular data. We specifically focus our technique on anomalies that are erroneous observations. This is achieved by localizing individual sample columns (cells) with potential errors and assigning corresponding confidence scores. In addition, the model provides the expected cell value estimates to fix the errors. We evaluate our approach based on three standard public tabular datasets (Credit Default, Adult, IEEE Fraud) and one proprietary dataset (Holdings). We find that denoising autoencoders applied to this task already outperform other approaches in the cell error detection rates as well as in the expected value rates. Additionally, we analyze how a specialized loss designed for cell error detection can further improve these metrics. Our framework is designed for a domain expert to understand abnormal characteristics of an anomaly, as well as to improve in-house data quality management processes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源