论文标题

TCGM:半监督多模式学习的信息理论框架

TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning

论文作者

Sun, Xinwei, Xu, Yilun, Cao, Peng, Kong, Yuqing, Hu, Lingjing, Zhang, Shanghang, Wang, Yizhou

论文摘要

从多种方式融合数据为训练机器学习系统提供了更多信息。但是,用大量数据标记每种模式的标签非常昂贵且耗时,这导致了半监视的多模式学习的关键问题。现有方法在适当的假设下缺乏跨模式的无效融合或缺乏理论保证。在本文中,我们提出了一种新型的信息理论方法,即\ textbf {t} otal \ textbf {c} orrelation \ textbf {g} ain \ textbf {m} aximization(m} aximization(m} aximization(tcgm),对于半纯粹的多型模式学习,既有效率又有差异性的属性(又有差异)未标记的数据点的方式促进了每种模式的训练分类器(II)的理论保证,可以识别贝叶斯分类器,即所有模态的地面真相后代。具体而言,通过对所有模式的分类器最大化TC诱导的损失(即TC增益),这些分类器可以合作发现等效类的基地真相分类器;并通过利用有限的标记数据来确定唯一的数据。我们将方法应用于各种任务并实现最新结果,包括新闻分类,情绪识别和疾病预测。

Fusing data from multiple modalities provides more information to train machine learning systems. However, it is prohibitively expensive and time-consuming to label each modality with a large amount of data, which leads to a crucial problem of semi-supervised multi-modal learning. Existing methods suffer from either ineffective fusion across modalities or lack of theoretical guarantees under proper assumptions. In this paper, we propose a novel information-theoretic approach, namely \textbf{T}otal \textbf{C}orrelation \textbf{G}ain \textbf{M}aximization (TCGM), for semi-supervised multi-modal learning, which is endowed with promising properties: (i) it can utilize effectively the information across different modalities of unlabeled data points to facilitate training classifiers of each modality (ii) it has theoretical guarantee to identify Bayesian classifiers, i.e., the ground truth posteriors of all modalities. Specifically, by maximizing TC-induced loss (namely TC gain) over classifiers of all modalities, these classifiers can cooperatively discover the equivalent class of ground-truth classifiers; and identify the unique ones by leveraging limited percentage of labeled data. We apply our method to various tasks and achieve state-of-the-art results, including news classification, emotion recognition and disease prediction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源