论文标题
来自弱标签的本体学学习
Ontological Learning from Weak Labels
论文作者
论文摘要
本体论通过定义域的概念或属性以及这些概念之间的关系来涵盖知识的形式表示。在这项工作中,我们试图调查使用此本体论信息是否会从弱标记的数据中改善学习,这很容易收集,因为它仅需要存在或不存在事件才能知道。我们使用音频集本体和数据集,其中包含弱标记的音频剪辑,并具有本体概念和本体论,提供了概念之间的“是”关系。我们首先对SoundEvent_ontology提出的模型进行了修改,以符合多标签方案,然后通过使用图形卷积网络(GCN)来建模本体学信息以学习概念来建模。我们发现,基线暹罗通过在弱标记和多标签场景中纳入本体信息来表现更好,但是GCN确实可以更好地捕获本体学知识,以获得弱的,多标签的数据。在我们的实验中,我们还研究了不同的模块如何耐受弱标签引入的噪声,并更好地纳入本体学信息。对于低级概念而言,我们最好的暹罗 - GCN模型可实现MAP = 0.45,而AUC = 0.87,对于高级概念而言,MAP = 0.72,AUC = 0.86,这是对基线暹罗的改进,但与我们的模型相同,与不使用本体学信息的模型相同。
Ontologies encompass a formal representation of knowledge through the definition of concepts or properties of a domain, and the relationships between those concepts. In this work, we seek to investigate whether using this ontological information will improve learning from weakly labeled data, which are easier to collect since it requires only the presence or absence of an event to be known. We use the AudioSet ontology and dataset, which contains audio clips weakly labeled with the ontology concepts and the ontology providing the "Is A" relations between the concepts. We first re-implemented the model proposed by soundevent_ontology with modification to fit the multi-label scenario and then expand on that idea by using a Graph Convolutional Network (GCN) to model the ontology information to learn the concepts. We find that the baseline Siamese does not perform better by incorporating ontology information in the weak and multi-label scenario, but that the GCN does capture the ontology knowledge better for weak, multi-labeled data. In our experiments, we also investigate how different modules can tolerate noises introduced from weak labels and better incorporate ontology information. Our best Siamese-GCN model achieves mAP=0.45 and AUC=0.87 for lower-level concepts and mAP=0.72 and AUC=0.86 for higher-level concepts, which is an improvement over the baseline Siamese but about the same as our models that do not use ontology information.