丟進 array 是 OK 的,但問題在於他需要判斷 entry 是否重複,卻沒有用 hash 或是 tree 的結構,而這邊大約有 63k 筆資料,用 array 實做就產生了 O(n^2) 的演算法:
But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless. You have unique hashes why not use a hash map.
if it’s called again within the string’s range, return cached value
而第二個問題他直接把檢查是否有重複的跳過,因為資料本身不重複:
And as for the hash-array problem, it’s more straightforward - just skip the duplicate checks entirely and insert the items directly since we know the values are unique.
I found this while making a collection of what C implementation does what at https://news.ycombinator.com/item?id=26298300.
There are two basic implementation strategies. The BSD (FreeBSD and OpenBSD and more than likely NetBSD too), Microsoft, GNU, and MUSL C libraries use one, and suffer from this; whereas the OpenWatcom, P.J. Plauger, Tru64 Unix, and my standard C libraries use another, and do not.
The 2002 report in the comp.lang.c Usenet newsgroup (listed in that discussion) is the earliest that I've found so far.
Nearly 70% of smart TVs and 46% of game consoles were found to contain hardcoded DNS settings - allowing them to simply ignore your local network’s DNS server entirely. On average, Smart TVs generate an average of 60 megabytes of outgoing Internet traffic per day, all the while bypassing tools like PiHole.
Internet Archive 把模擬器掛上去了,所以你可以直接在網站上用這些 Flash 程式:
Great news for everyone concerned about the Flash end of life planned for end of 2020: The Internet Archive is now emulating Flash animations, games and toys in our software collection.
Utilizing an in-development Flash emulator called Ruffle, we have added Flash support to the Internet Archive’s Emularity system, letting a subset of Flash items play in the browser as if you had a Flash plugin installed. While Ruffle’s compatibility with Flash is less than 100%, it will play a very large portion of historical Flash animation in the browser, at both a smooth and accurate rate.
You will not need to have a flash plugin installed, and the system works in all browsers that support Webassembly.
KataGo 是目前 open source 裡最強的計算引擎了,不過先前的缺點就是得透過 OpenCL 或是 CUDA 才能跑,所以基本上得有張夠力的顯示卡才行。
如果要想要在 CPU 上跑 (不透過硬體顯示卡),一種方式是透過 OpenCL 的方式模擬,在 Linux 下可以透過 pocl 達成,效能就普普通通,但算是會動的東西,不過 Windows 下好像不太好弄... 這也是先前蠻多人還是繼續使用 Leela Zero 的原因。
最近 KataGo 在 1.5 版實做了純 CPU 版本的程式碼,是透過 Eigen 這套 library 達成的,不過大家測過以後發現慢到爆炸 XDDD
因為作者沒有提供 CPU 版本的 binary,我自己在 Linux 下抓程式碼 compile 後測試發現只會用一個 CPU (沒有 multi threading),對比於在 1080Ti 上跑 OpenCL 版本大約 150 visits/sec (40b),但 CPU 版本是 0.0x visits/sec 啊 XDDD
作者自己在 GitHub 上討論時也有提到這個版本只有確認正確性,完全沒有考慮效能...
不過就有其他人跳出來改善了,在「Optimization of Eigen backend #288」這邊可以看到 kaorahi 拋出了不少修改,可以看到從一開始的 eigen_naive_loop (對比 1.5 版有 13x 的成長) 一路到 borrow_tensorflow (1400x) 的版本,使得在 CPU 上面跑 15b 也有 10 visits/sec 了:
"borrow_tensorflow" version: x1400 speed up from 1.5.0 (70% of libtensorflow backend). Now 15b net is usable for me. I get 19 visits/s in benchmark and 10 visits/s in GUI with 15b net.
這樣看起來已經快了不少,這樣子 Leela Zero 應該會逐漸淡出了,CPU-only 算是最後一塊 Leela Zero 還可以爭的地盤...
I figured that since we were to replace that matrix function anyway, I could try replacing it with XMMatrixInverse being a “modern” replacement for D3DXMatrixInverse. XMMatrixInverse also uses SSE2 instructions so it should be equally optimal to the D3DX function, but I was nearly sure it would break the same way.
Here’s Intel versus AMD relative error of RCPPS instruction: http://const.me/tmp/vrcpps-errors-chart.png AMD is Ryzen 5 3600, Intel is Core i3 6157U.
Over the complete range of floats, AMD is more precise on average, 0.000078 versus 0.000095 relative error. However, Intel has 0.000300 maximum relative error, AMD 0.000315.
Both are well within the spec. The documentation says “maximum relative error for this approximation is less than 1.5*2^-12”, in human language that would be 3.6621E-4.
Source code that compares them by creating 16GB binary files with the complete range of floats: https://gist.github.com/Const-me/a6d36f70a3a77de00c61cf4f6c17c7ac
至於為什麼會生出 NaN 的原因,沒找出來還是有點可惜,不過這個解法還行,就是「新版的 library 既然沒問題,就大家也不要太計較舊版的問題」的概念...