论文标题
学会检测有趣的异常
Learning to Detect Interesting Anomalies
论文作者
论文摘要
异常检测算法通常应用于用户手工制作的静态,不变的数据功能。但是,用户如何系统地为从未见过的异常制造出良好的功能呢?在这里,我们将深入学习与主动学习融为一体 - 其中,Oracle迭代在一系列回合中以少量的算法为单位标记,以自动而动态地改善数据功能,以实现有效的离群检测。这种方法(Ahunt)在MNIST,CIFAR10和GALAXY-DESI数据上表现出卓越的性能,并在具有静态特征空间的标准异常检测和主动学习算法方面大大优于标准异常检测。除了提高性能之外,Ahunt还允许根据Oracle的评估有机地增长异常类别的数量。广泛的消融研究探讨了甲骨文问题选择策略和损失功能对绩效的影响。我们说明了动态异常类别分类法是朝着反映用户兴趣的不同异常类完全个性化排名的又一步的一步,从而使算法学会忽略统计学意义但无趣的异常值(例如,噪声)。在大量的天文数据集的时代,为各种用户组合,这应该证明这应该是有用的,这些用户只能查看传入数据的一小部分。
Anomaly detection algorithms are typically applied to static, unchanging, data features hand-crafted by the user. But how does a user systematically craft good features for anomalies that have never been seen? Here we couple deep learning with active learning -- in which an Oracle iteratively labels small amounts of data selected algorithmically over a series of rounds -- to automatically and dynamically improve the data features for efficient outlier detection. This approach, AHUNT, shows excellent performance on MNIST, CIFAR10, and Galaxy-DESI data, significantly outperforming both standard anomaly detection and active learning algorithms with static feature spaces. Beyond improved performance, AHUNT also allows the number of anomaly classes to grow organically in response to Oracle's evaluations. Extensive ablation studies explore the impact of Oracle question selection strategy and loss function on performance. We illustrate how the dynamic anomaly class taxonomy represents another step towards fully personalized rankings of different anomaly classes that reflect a user's interests, allowing the algorithm to learn to ignore statistically significant but uninteresting outliers (e.g., noise). This should prove useful in the era of massive astronomical datasets serving diverse sets of users who can only review a tiny subset of the incoming data.