论文标题
部分可观测时空混沌系统的无模型预测
Solving Price Per Unit Problem Around the World: Formulating Fact Extraction as Question Answering
论文作者
论文摘要
每单位价格(PPU)是比较产品时在电子商务网站上购物的重要信息。计算PPU需要找到产品中的总量,这并不总是由卖方提供。为了预测总量,需要正确推断出产品属性(例如标题,描述和图像)中给出的所有相关数量。我们将此问题提出为提问(QA)任务,而不是命名的实体识别(NER)任务,以进行事实提取。在我们的质量检查方法中,我们首先预测测量单位(UOM)类型(例如,体积,重量或计数),该方法提出了所需的问题(例如,“总量是多少?”),然后使用此问题来找到所有相关答案。我们的模型架构由两个子任务的两个子网组成:一个预测UOM类型(或问题)的分类器和提取相关数量的提取器。我们为两个子任务使用深度字符级的CNN体系结构,这使(1)可轻松扩展到具有相似字母的新商店,(2)由于其跨度图像架构架构而引起的多个跨度答案,以及(3)通过保持模型延迟延迟的方式来易于部署。在全球所有商店中,我们的质量保证方法的精度和基于BERT的事实提取方法的优先级优于基于规则的方法,在美国商店中,精度提升最大。1.6%。
Price Per Unit (PPU) is an essential information for consumers shopping on e-commerce websites when comparing products. Finding total quantity in a product is required for computing PPU, which is not always provided by the sellers. To predict total quantity, all relevant quantities given in a product attributes such as title, description and image need to be inferred correctly. We formulate this problem as a question-answering (QA) task rather than named entity recognition (NER) task for fact extraction. In our QA approach, we first predict the unit of measure (UoM) type (e.g., volume, weight or count), that formulates the desired question (e.g., "What is the total volume?") and then use this question to find all the relevant answers. Our model architecture consists of two subnetworks for the two subtasks: a classifier to predict UoM type (or the question) and an extractor to extract the relevant quantities. We use a deep character-level CNN architecture for both subtasks, which enables (1) easy expansion to new stores with similar alphabets, (2) multi-span answering due to its span-image architecture and (3) easy deployment by keeping model-inference latency low. Our QA approach outperforms rule-based methods by 34.4% in precision and also BERT-based fact extraction approach in all stores globally, with largest precision lift of 10.6% in the US store.