论文标题

基于图的贝叶斯半监督学习的数学基础

Mathematical Foundations of Graph-Based Bayesian Semi-Supervised Learning

论文作者

Trillos, Nicolas García, Sanz-Alonso, Daniel, Yang, Ruiyi

论文摘要

近几十年来,科学和工程的可用数据数量的重大增长彻底改变了。然而,尽管现在收集和存储数据的空前很容易,但通过补充每个功能的标签来标记数据仍然是具有挑战性的。标签过程需要专家知识或乏味且耗时的说明任务包括用诊断X射线标记X射线,具有蛋白质类型的蛋白质序列,其主题的文本,通过其情感来推文或视频通过其类型。在这些和许多其他示例中,由于成本和时间限制,只能手动标记一些功能。我们如何才能最好地将标签信息从少数昂贵的标签功能到大量未标记的标签信息传播?这是半监督学习(SSL)提出的问题。 本文概述了基于图的贝叶斯SSL的最新基础发展,这是一种使用特征之间的相似性的标签传播概率框架。 SSL是一个活跃的研究领域,对现有文献的彻底回顾超出了本文的范围。我们的重点将放在我们自己的研究中提取的主题,这些主题说明了对基于图的贝叶斯SSL的统计准确性和计算效率进行严格研究的广泛数学工具和思想。

In recent decades, science and engineering have been revolutionized by a momentous growth in the amount of available data. However, despite the unprecedented ease with which data are now collected and stored, labeling data by supplementing each feature with an informative tag remains to be challenging. Illustrative tasks where the labeling process requires expert knowledge or is tedious and time-consuming include labeling X-rays with a diagnosis, protein sequences with a protein type, texts by their topic, tweets by their sentiment, or videos by their genre. In these and numerous other examples, only a few features may be manually labeled due to cost and time constraints. How can we best propagate label information from a small number of expensive labeled features to a vast number of unlabeled ones? This is the question addressed by semi-supervised learning (SSL). This article overviews recent foundational developments on graph-based Bayesian SSL, a probabilistic framework for label propagation using similarities between features. SSL is an active research area and a thorough review of the extant literature is beyond the scope of this article. Our focus will be on topics drawn from our own research that illustrate the wide range of mathematical tools and ideas that underlie the rigorous study of the statistical accuracy and computational efficiency of graph-based Bayesian SSL.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源