学习探索：在引导互动数据探索中

论文标题

学习探索：在引导互动数据探索中

Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning

论文作者

Cao, Yukun, Xie, Xike, Huang, Kexin

论文摘要

交互式数据探索（IDE）是理解大数据的有效方法，其数量和复杂性超出了人类的能力。 IDE的主要目标是通过用户标签的多轮发现从数据库中发现用户兴趣区域。现有的IDE采用主动学习框架，用户迭代区分或标记选定元组的兴趣。数据探索过程可以看作是培训分类器的过程，该过程确定数据库元组是否对用户很有趣。因此，有效的探索需要很少的用户标签迭代才能到达感兴趣的数据区域。在这项工作中，我们将数据探索视为几乎没有学习的过程，在这种过程中，只有几个培训示例或探索迭代才能学习分类器。为此，我们根据元学习提出了一个学习到探索的框架，该框架学习了如何使用自动生成的元任务学习分类器，以便可以大大缩短探索过程。在实际数据集上进行的广泛实验表明，我们的建议在准确性和效率方面优于现有探索解决方案。

Interactive data exploration (IDE) is an effective way of comprehending big data, whose volume and complexity are beyond human abilities. The main goal of IDE is to discover user interest regions from a database through multi-rounds of user labelling. Existing IDEs adopt active-learning framework, where users iteratively discriminate or label the interestingness of selected tuples. The process of data exploration can be viewed as the process of training a classifier, which determines whether a database tuple is interesting to a user. An efficient exploration thus takes very few iterations of user labelling to reach the data region of interest. In this work, we consider the data exploration as the process of few-shot learning, where the classifier is learned with only a few training examples, or exploration iterations. To this end, we propose a learning-to-explore framework, based on meta-learning, which learns how to learn a classifier with automatically generated meta-tasks, so that the exploration process can be much shortened. Extensive experiments on real datasets show that our proposal outperforms existing explore-by-example solutions in terms of accuracy and efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题