致力于NLP支持的语义数据管理

论文标题

致力于NLP支持的语义数据管理

Towards NLP-supported Semantic Data Management

论文作者

Burgdorf, Andreas, Pomp, André, Meisen, Tobias

论文摘要

当一个应用程序合并来自不同源的数据时，数据的异质性构成了巨大的挑战。例如，基于本体的数据管理（OBDM）为此提供了解决方案。 OBDM的挑战是从数据集中自动创建语义模型。该过程通常是通过数据或标签驱动的，始终涉及手动人工干预。我们确定了数据的文本描述，一种元数据形式，迅速被人类生产和消费，是自动语义建模的第三可能的基础。在本文中，我们介绍了我们计划如何使用文本描述来增强语义数据管理。我们将使用最先进的NLP技术来识别文本描述中的概念，并通过与不断发展的本体论结合起来构建语义模型。我们将使用自动识别的模型与人类数据提供商结合使用，以自动扩展本体，以便随着时间的推移学习新的经过验证的概念。最后，我们将使用创建的本体论并自动识别语义模型来对新数据源的描述进行评分，甚至可以自动生成描述性文本，这些文本比正式模型更易于人类用户理解。我们介绍了我们为正在进行的研究和预期结果计划的程序。

The heterogeneity of data poses a great challenge when data from different sources is to be merged for one application. Solutions for this are offered, for example, by ontology-based data management (OBDM). A challenge of OBDM is the automatic creation of semantic models from datasets. This process is typically performed either data- or label-driven and always involves manual human intervention. We identified textual descriptions of data, a form of metadata, quickly to be produced and consumed by humans, as third possible basis for automatic semantic modelling. In this paper, we present, how we plan to use textual descriptions to enhance semantic data management. We will use state of the art NLP technologies to identify concepts within textual descriptions and build semantic models from this in combination with an evolving ontology. We will use automatically identified models in combination with the human data provider to automatically extend the ontology so that it learns new verified concepts over time. Finally, we will use the created ontology and automatically identified semantic models to either rate descriptions for new data sources or even to automatically generate descriptive texts that are easier to understand by the human user than formal models. We present the procedure which we plan for the ongoing research, as well as expected outcomes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题