是一篇老文章了... (2014 年的文章，最近從其他地方提起)
這邊講的是英文，不過同樣方式也可以拿來分析其他語言：「The distribution of letters in English words」，原始文章在「Graphing the distribution of English letters towards the beginning, middle or end of words」。
The data is from the entire Brown corpus in the Natural Language Toolkit. It's a smaller and out-of-date corpus, but it's open source and easy to obtain. I repeated the analysis with COHA, the Corpus of Historical American English, a well-curated, proprietary data set from Brigham Young University for which I have a license, and the only differences were in rare letters like "z" or "x".