论文标题
多摩萨学:语料库总结和探索使用多个一致性镶嵌可视化
Multi-Mosaics: Corpus Summarizing and Exploration using multiple Concordance Mosaic Visualisations
论文作者
论文摘要
在词典学,翻译研究和计算语言学等领域工作的研究人员使用自动化和半自动化工具的组合来分析文本语料库的内容。关键字,命名实体和事件通常自动提取为分析的第一步。一致性 - 或根据用户定义的关键字按字母顺序排列文本语料库的段落是最古老,最广泛使用的文本分析形式之一。本文介绍了多种摩萨学,这是一种使用多个隐式链接的一致性镶嵌可视化的语料库分析工具。多摩萨学支持在围绕提取的关键字的上下文窗口中检查语言关系。
Researchers working in areas such as lexicography, translation studies, and computational linguistics, use a combination of automated and semi-automated tools to analyze the content of text corpora. Keywords, named entities, and events are often extracted automatically as the first step in the analysis. Concordancing -- or the arranging of passages of a textual corpus in alphabetical order according to user-defined keywords -- is one of the oldest and still most widely used forms of text analysis. This paper describes Multi-Mosaics, a tool for corpus analysis using multiple implicitly linked Concordance Mosaic visualisations. Multi-Mosaics supports examining linguistic relationships within the context windows surrounding extracted keywords.