在 Hacker News Daily 上看到「Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust」這個有趣的面試問題,在 Hacker News 上的討論也可以看看:「Performance comparison: counting words in Python, Go, C++, C, Awk, Forth, Rust (benhoyt.com)」。
問題是這樣:
Write a program to count the frequencies of unique words from standard input, then print them out with their frequencies, ordered most frequent first. For example, given this input:
The foo the foo the
defenestration the
The program should print the following:
the 4
foo 2
defenestration 1
在面試時,重點不在於用哪個程式語言,而是在面試時一路往下問,像是 profiling 的部份,內部資料結構的部份,可以問得很深。
撇開面試,這個問題其實是個經典問題,當年 Donald Knuth 與 Doug McIlroy 兩位大師都有玩過:
Incidentally, this problem set the scene for a wizard duel between two computer scientists several decades ago. In 1986, Jon Bentley asked Donald Knuth to show off “literate programming” with a solution to this problem, and he came up with an exquisite, ten-page Knuthian masterpiece. Then Doug McIlroy (the inventor of Unix pipelines) replied with a one-liner Unix shell version using tr
, sort
, and uniq
.
不過當年玩的問題有點變形:
Given a text file and an integer k, print the k most common words in the file (and the number of their occurrences) in decreasing frequency.
他們當時其實一個是在示範 Literate programming,而另外一個在展現 pipe 的威力,都是借題發揮而已,跟上面的那些東西倒是沒什麼太大關係。