论文标题
XSTEM:基于示例的Stemming算法
XSTEM: An exemplar-based stemming algorithm
论文作者
论文摘要
茎是通过从中取出词缀将相关词减少到标准形式的过程。现有算法因其复杂性,可配置性,未知单词的处理以及避免过度茎的能力而异。本文介绍了一种快速,简单,可配置的,高精度,高回调的词干算法,将基于单词的查找表的简单性和性能与基于规则的方法的强大概括性相结合,以避免出现量不足的词外单词。
Stemming is the process of reducing related words to a standard form by removing affixes from them. Existing algorithms vary with respect to their complexity, configurability, handling of unknown words, and ability to avoid under- and over-stemming. This paper presents a fast, simple, configurable, high-precision, high-recall stemming algorithm that combines the simplicity and performance of word-based lookup tables with the strong generalizability of rule-based methods to avert problems with out-of-vocabulary words.