论文标题
标点符号使用模式的通用与系统特定特征在〜主要西方语言中
Universal versus system-specific features of punctuation usage patterns in~major Western~languages
论文作者
论文摘要
著名的谚语是“言语是银,沉默是黄金”,具有悠久的跨国史和多种特定含义。实际上,在书面文字中,标点符号可以被视为其表现之一。的确,有效讲和写作的优点涉及到适当放置休息的能力。在本研究中,基于七种主要西方语言的世界著名和代表性文学文本的大量语料库,这表明,在几乎所有文本中,连续的标点符号之间的间隔分布只能以离散的Weibull分布的两个参数为特征,这些参数可以以所谓的危险范围的范围进行直接解释。但是,这两个参数的值往往是特定于语言的,甚至似乎在导航翻译。计算危险函数的属性表明,在所研究的语言中,英语被证明是最少的限制,因为有必要将连续的标点符号放置以分区一系列单词。这可能表明,与其他所研究的语言相比,英语在允许更长的不间断单词序列的意义上更加灵活。西班牙人揭示了相似的趋势,只有较小的程度。
The celebrated proverb that "speech is silver, silence is golden" has a long multinational history and multiple specific meanings. In written texts punctuation can in fact be considered one of its manifestations. Indeed, the virtue of effectively speaking and writing involves - often decisively - the capacity to apply the properly placed breaks. In the present study, based on a large corpus of world-famous and representative literary texts in seven major Western languages, it is shown that the distribution of intervals between consecutive punctuation marks in almost all texts can universally be characterised by only two parameters of the discrete Weibull distribution which can be given an intuitive interpretation in terms of the so-called hazard function. The values of these two parameters tend to be language-specific, however, and even appear to navigate translations. The properties of the computed hazard functions indicate that among the studied languages, English turns out to be the least constrained by the necessity to place a consecutive punctuation mark to partition a sequence of words. This may suggest that when compared to other studied languages, English is more flexible, in the sense of allowing longer uninterrupted sequences of words. Spanish reveals similar tendency to only a bit lesser extent.