论文标题

估计实体解决算法的性能:通过patentsview.org学习的教训

Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

论文作者

Binette, Olivier, York, Sokhna A, Hickerson, Emma, Baek, Youngsoo, Madhavan, Sarvo, Jones, Christina

论文摘要

本文介绍了一种针对实体解决算法的新型评估方法。它是由美国专利和商标办公室专利数据勘探工具PatentSview.org激励的,该工具使用实体解决算法抑制了专利发明人的歧义。我们提供数据收集方法和量身定制的性能估计器,以解释采样偏见。我们的方法是简单,实用和有原则的 - 关键特征,使我们能够描绘出Patentsview歧义性能的第一幅代表性图片。这种方法用于为PatentsView的用户提供数据的可靠性,并允许比较竞争歧义算法。

This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical and principled -- key characteristics that allow us to paint the first representative picture of PatentsView's disambiguation performance. This approach is used to inform PatentsView's users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源