论文标题
一项关于极端多标签学习的调查
A Survey on Extreme Multi-label Learning
论文作者
论文摘要
近几十年来,多标签学习吸引了学术和行业领域的大大关注。尽管现有的多标签学习算法在各种任务中都取得了良好的性能,但它们隐含地假设目标标签空间的大小并不大,这对于现实世界中的情况可能是限制的。此外,由于计算和内存开销,直接使它们适应极大的标签空间是不可行的。因此,极端的多标签学习(XML)已成为一项重要任务,并提出了许多有效的方法。为了充分了解XML,我们在本文中进行了一项调查研究。我们首先从监督学习的角度阐明了XML的正式定义。然后,根据问题的不同模型架构和挑战,我们对方法的每个类别的优点和缺点进行了详尽的讨论。为了进行实证研究,我们收集了有关XML的大量资源,包括代码实施和有用的工具。最后,我们建议在XML中提出可能的研究方向,例如新的评估指标,尾标问题和弱监督的XML。
Multi-label learning has attracted significant attention from both academic and industry field in recent decades. Although existing multi-label learning algorithms achieved good performance in various tasks, they implicitly assume the size of target label space is not huge, which can be restrictive for real-world scenarios. Moreover, it is infeasible to directly adapt them to extremely large label space because of the compute and memory overhead. Therefore, eXtreme Multi-label Learning (XML) is becoming an important task and many effective approaches are proposed. To fully understand XML, we conduct a survey study in this paper. We first clarify a formal definition for XML from the perspective of supervised learning. Then, based on different model architectures and challenges of the problem, we provide a thorough discussion of the advantages and disadvantages of each category of methods. For the benefit of conducting empirical studies, we collect abundant resources regarding XML, including code implementations, and useful tools. Lastly, we propose possible research directions in XML, such as new evaluation metrics, the tail label problem, and weakly supervised XML.