Linus 狂幹 Intel 的 AVX-512

這幾天蠻熱鬧的消息,Linus 幹翻 Intel 丟出來的 AVX-512:「Alder Lake and AVX-512」。

在維基百科的「Advanced Vector Extensions」這邊有提到,因為 AVX-512 執行時會消耗產生更多的熱量,所以得壓低 Turbo Boost 執行:

Since AVX instructions are wider and generate more heat, Intel processors have provisions to reduce the Turbo Boost frequency limit when such instructions are being executed. The throttling is divided into three levels:

  • L0 (100%): The normal turbo boost limit.
  • L1 (~85%): The "AVX boost" limit. Soft-triggered by 256-bit "heavy" (floating-point unit: FP math and integer multiplication) instructions. Hard-triggered by "light" (all other) 512-bit instructions.
  • L2 (~60%): The "AVX-512 boost" limit. Soft-triggered by 512-bit heavy instructions.

本來 AVX 與 AVX-2 只會在某些重量級的指令時會壓 15%,現在在 AVX-512 則是變成常態,而且有些會降到 40%,對於同時在跑的應用會受到很大的影響,所以 Linus 也直接放話要用他的權限擋這件事情 (我把動詞讀錯了):

I want my power limits to be reached with regular integer code, not with some AVX512 power virus that takes away top frequency (because people ended up using it for memcpy!) and takes away cores (because those useless garbage units take up space).

在後面的討論串「Alder Lake and AVX-512」這邊 Linus 有提到更細,像是他對於 MMX/SSE/AVX/AVX2 的想法,以及為什麼他這麼厭惡 AVX-512。

AMD 的繼續看戲 XDDD

超快速的 Base64 encoding/decoding 實做

看到「Base64 encoding and decoding at almost the speed of a memory copy」這個,可以超級快速編解碼 Base64 的資料。

實做上是透過 IntelAVX-512 加速,在資料夠大的情況下 (超過 L1 cache 的大小),可以達到接近字串複製的速度 (這邊提到的 memcpy()):

We show how we can encode and decode base64 data at nearly the speed of a memory copy (memcpy) on recent Intel processors, as long as the data does not fit in the first-level (L1) cache. We use the SIMD (Single Instruction Multiple Data) instruction set AVX-512 available on commodity processors. Our implementation generates several times fewer instructions than previous SIMD-accelerated base64 codecs.

不過這樣 AMD 暫時要哭哭...

最新的 Firefox 56 對 AES-GCM 效能的改善

昨天釋出的 Firefox 56 對於 AES-GCM 在老電腦上改善了不少效能:「Improving AES-GCM Performance」。

首先是 Firefox 自己的數據分析,可以看到 AES-GCM 佔目前加密連線裡的大宗,再來是 AES-CBC:

先以 Linux 64bits 環境的數據來看,Firefox 56 的 NSS 3.32 大幅改善了老電腦的效能 (不支援 AES-NI 硬體加解密的 CPU,甚至是不支援 PCLMUL 的 CPU,以及不支援 AVX 的 CPU):

在 Linux 32bits 環境上則是連預設值大幅改善,不過用的人應該少很多了:

Windows 下則是因為 64bits 或是 32bits 都有足夠的使用者,所以平常就花了不少力氣。但也可以看出對於老電腦的速度提升:

Mac (64bits only) 算是這次比較大的提升,連新電腦的預設值都大幅變快:

加上之後陸續的改善 (尤其是下一版 Firefox 57 的 Project Quantum),這幾版應該會拉出不少效能...

Google 發表了三個 Hash 演算法的實作

Google 發表了三個 Hash 演算法的實作:「New algorithms may lower the cost of secure computing」。

第一個是 SipHash 的加速實作,透過 AVX-2 指令集加速,看維基百科的資料,2011 後的 Intel/AMD CPU 似乎都有提供這組指令集:

Our first hash function produces the same output as SipHash, but 1.5 times as quickly thanks to AVX-2 instructions.

第二個是 SipHash 的改良版,但輸出不同 (所以不是 SipHash),但速度比 SipHash 更快:

The second improvement uses j-lanes tree hashing to process multiple inputs in parallel, which is 3 times as fast. This technique is known to be secure, but produces different output than the original SipHash and is slightly slower for short inputs.

第三個則是新的 Hash,速度比前兩者又更快了,但還需要有更多人分析才能確認安全性:

HighwayHash is based on a new way of mixing inputs with just a few AVX-2 multiply and permute instructions. We are hopeful that the result is a cryptographically strong pseudorandom function, but new cryptanalysis methods might be needed for analyzing this promising family of hash functions. HighwayHash is significantly faster than SipHash for all measured input sizes, with about 7 times higher throughput at 1 KiB.

三者的程式碼都可以在 GitHub 上的「google/highwayhash」找到,看 LICENSE 檔案是 Apache License 2.0