丟進 array 是 OK 的,但問題在於他需要判斷 entry 是否重複,卻沒有用 hash 或是 tree 的結構,而這邊大約有 63k 筆資料,用 array 實做就產生了 O(n^2) 的演算法:
But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless. You have unique hashes why not use a hash map.
if it’s called again within the string’s range, return cached value
而第二個問題他直接把檢查是否有重複的跳過,因為資料本身不重複:
And as for the hash-array problem, it’s more straightforward - just skip the duplicate checks entirely and insert the items directly since we know the values are unique.
I found this while making a collection of what C implementation does what at https://news.ycombinator.com/item?id=26298300.
There are two basic implementation strategies. The BSD (FreeBSD and OpenBSD and more than likely NetBSD too), Microsoft, GNU, and MUSL C libraries use one, and suffer from this; whereas the OpenWatcom, P.J. Plauger, Tru64 Unix, and my standard C libraries use another, and do not.
The 2002 report in the comp.lang.c Usenet newsgroup (listed in that discussion) is the earliest that I've found so far.