OpenXDATA：多目标数据生成和缺少标签完成的工具

论文标题

OpenXDATA：多目标数据生成和缺少标签完成的工具

openXDATA: A Tool for Multi-Target Data Generation and Missing Label Completion

论文作者

Weninger, Felix, Zhang, Yue, Picard, Rosalind W.

论文摘要

机器学习中的一个常见问题是处理具有不相交标签空间和缺失标签的数据集。在这项工作中，我们介绍了OpenXDATA工具，该工具以部分标记或未标记的数据集完成了缺失的标签，以便在数据集的关节标签空间中生成带有标签的多目标数据。为此，我们设计并实施了跨数据标签完成（CDLC）算法，该算法使用多任务共享隐藏式DNN DNN迭代完成了来自不同数据集的实例的稀疏标签矩阵。我们将新工具应用于跨四个情感数据集的标签：一个标记有离散情感类别（例如，快乐，悲伤，愤怒），一个标记为沿着唤醒和价值尺寸的连续值标记，一个带有两种标签，一个标签，一个未标记的标签。通过辍学的真实标签进行测试，我们显示了所有数据集的类别和连续标签的能力，以接近地面真实价值的速度。 OpenXData可从https://github.com/fweninger/openxdata获得GNU通用公共许可证。

A common problem in machine learning is to deal with datasets with disjoint label spaces and missing labels. In this work, we introduce the openXDATA tool that completes the missing labels in partially labelled or unlabelled datasets in order to generate multi-target data with labels in the joint label space of the datasets. To this end, we designed and implemented the cross-data label completion (CDLC) algorithm that uses a multi-task shared-hidden-layer DNN to iteratively complete the sparse label matrix of the instances from the different datasets. We apply the new tool to estimate labels across four emotion datasets: one labeled with discrete emotion categories (e.g., happy, sad, angry), one labeled with continuous values along arousal and valence dimensions, one with both kinds of labels, and one unlabeled. Testing with drop-out of true labels, we show the ability to estimate both categories and continuous labels for all of the datasets, at rates that approached the ground truth values. openXDATA is available under the GNU General Public License from https://github.com/fweninger/openXDATA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题