论文标题

Neurochaos特征转化和分类,用于学习不平衡的学习

Neurochaos Feature Transformation and Classification for Imbalanced Learning

论文作者

Sethi, Deeksha, Nagaraj, Nithin, B, Harikrishnan N

论文摘要

从有限和不平衡数据中学习是人工智能界的一个挑战性问题。实时场景需要从罕见事件中进行决策,其中数据通常会失衡。这些情况通常在医疗应用,网络安全,灾难性的预测等中出现。这激发了能够从不平衡数据中学习的学习算法的发展。人脑毫不费力地从数据不平衡的数据中学习。最近提出了一种新型的学习算法,即神经洛卡斯学习(NL),受到人脑中混乱的神经元射击的启发。 NL分为三个块:特征转换,神经chaos特征提取(CFX)和分类。在这项工作中,研究了Neurochaos的功效特征转化和提取在学习不平衡学习中的分类。我们提出了基于神经关节的特征转化和提取与传统ML算法的独特组合。这项研究中的探索数据集围绕医学诊断,钞票欺诈检测,环境应用和口头数字分类。在这项研究中,在高训练样本方面进行了实验。在前者中,在使用CFX功能后,九个数据集中有5个数据集中的宏F1得分表现出了性能提升。使用CFX+决策树的Statlog(HERT)数据集获得的最高性能提升为25.97%。在低训练样本制度(每班只有1到九个训练样本)中,使用CFX+随机森林获得了Haberman的生存数据集的最高性能提升144.38%。 NL提供了将CFX与任何ML分类器相结合以提高其性能的巨大灵活性,尤其是用于学习有限和不平衡数据的任务。

Learning from limited and imbalanced data is a challenging problem in the Artificial Intelligence community. Real-time scenarios demand decision-making from rare events wherein the data are typically imbalanced. These situations commonly arise in medical applications, cybersecurity, catastrophic predictions etc. This motivates the development of learning algorithms capable of learning from imbalanced data. Human brain effortlessly learns from imbalanced data. Inspired by the chaotic neuronal firing in the human brain, a novel learning algorithm namely Neurochaos Learning (NL) was recently proposed. NL is categorized in three blocks: Feature Transformation, Neurochaos Feature Extraction (CFX), and Classification. In this work, the efficacy of neurochaos feature transformation and extraction for classification in imbalanced learning is studied. We propose a unique combination of neurochaos based feature transformation and extraction with traditional ML algorithms. The explored datasets in this study revolve around medical diagnosis, banknote fraud detection, environmental applications and spoken-digit classification. In this study, experiments are performed in both high and low training sample regime. In the former, five out of nine datasets have shown a performance boost in terms of macro F1-score after using CFX features. The highest performance boost obtained is 25.97% for Statlog (Heart) dataset using CFX+Decision Tree. In the low training sample regime (from just one to nine training samples per class), the highest performance boost of 144.38% is obtained for Haberman's Survival dataset using CFX+Random Forest. NL offers enormous flexibility of combining CFX with any ML classifier to boost its performance, especially for learning tasks with limited and imbalanced data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源