歧视性，生成性和自我监督的方法，用于目标学习

论文标题

歧视性，生成性和自我监督的方法，用于目标学习

Discriminative, Generative and Self-Supervised Approaches for Target-Agnostic Learning

论文作者

Jin, Yuan, Buntine, Wray, Petitjean, Francois, Webb, Geoffrey I.

论文摘要

受监督的学习以歧视性和生成性学习为特征，试图根据一组预定义的预测属性集来预测单个（或有时是多个）预定义的目标属性的值。对于可用的信息和预测可能因实例而异的应用程序，我们提出了目标 - 反应学习的任务，在该任务中，每个预测实例的每个预测指标和目标都可以使用任意脱节的属性集。对于此任务，我们调查了多种技术，可用于处理缺失价值，自我监管的培训和伪样培训，并将其调整为适合该任务的一系列算法。我们对这套算法进行了广泛的实验，这些实验在大量的分类，连续和离散的数据集上进行了广泛的实验，并在分类和回归误差方面报告了它们的性能。我们还报告处理大规模数据集时这些算法的培训和预测时间。尽管它们针对不同类型的数据的特征是完全不同的，但生成和自我监管的学习模型均显示出在任务上的表现良好。然而，我们针对伪可能的定理理论也表明，它们与基于伪样训练的联合分布模型有关。

Supervised learning, characterized by both discriminative and generative learning, seeks to predict the values of single (or sometimes multiple) predefined target attributes based on a predefined set of predictor attributes. For applications where the information available and predictions to be made may vary from instance to instance, we propose the task of target-agnostic learning where arbitrary disjoint sets of attributes can be used for each of predictors and targets for each to-be-predicted instance. For this task, we survey a wide range of techniques available for handling missing values, self-supervised training and pseudo-likelihood training, and adapt them to a suite of algorithms that are suitable for the task. We conduct extensive experiments on this suite of algorithms on a large collection of categorical, continuous and discretized datasets, and report their performance in terms of both classification and regression errors. We also report the training and prediction time of these algorithms when handling large-scale datasets. Both generative and self-supervised learning models are shown to perform well at the task, although their characteristics towards the different types of data are quite different. Nevertheless, our derived theorem for the pseudo-likelihood theory also shows that they are related for inferring a joint distribution model based on the pseudo-likelihood training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题