字母在單字裡的位置分佈

是一篇老文章了... (2014 年的文章,最近從其他地方提起)

這邊講的是英文,不過同樣方式也可以拿來分析其他語言:「The distribution of letters in English words」,原始文章在「Graphing the distribution of English letters towards the beginning, middle or end of words」。

原文有描述他的資料分析來源:

The data is from the entire Brown corpus in the Natural Language Toolkit. It's a smaller and out-of-date corpus, but it's open source and easy to obtain. I repeated the analysis with COHA, the Corpus of Historical American English, a well-curated, proprietary data set from Brigham Young University for which I have a license, and the only differences were in rare letters like "z" or "x".

This entry was posted in Murmuring, Science and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *