Intel 用 AVX-512 加速 NumPy 的排序演算法被整合進主線了

IntelAVX-512 加速 NumPy 排序的實做被整合進主線了:「「Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts」」。

GitHub 的 PR 在「ENH: Vectorize quicksort for 16-bit and 64-bit dtype using AVX512 #22315」這邊,可以看到相關的留言:

This patch adds AVX512 based 64-bit on AVX512-SKX and 16-bit sorting on AVX512-ICL. All the AVX512 sorting code has been reformatted as a separate header files and put in a separate folder. The AVX512 64-bit sorting is nearly 10x faster and AVX512 16-bit sorting is nearly 16x faster when compared to std::sort. Still working on running NumPy benchmarks to get exact benchmark numbers

16-bit int sped up by 17x and float64 by nearly 10x for random arrays. Benchmarked on a 11th Gen Tigerlake i7-1165G7.

有點「有趣」的情況是,AVX-512 在新的 Intel 消費級 CPU 被拔掉了,只有伺服器工作站的 CPU 有保留。而 AMDZen 4 則是跳下去支援 AVX-512...

另外在「Intel Publishes Fast AVX-512 Sorting Library, 10~17x Faster Sorts in NumPy (phoronix.com)」這邊當然也有人提到,如果用更廣泛的 AVX2 (寬度是 256bits) 加速的話應該也會有很大的進步才對?幾乎這十年的 CPU 都有 AVX2 了... 不過看起來沒有什麼深入的討論?

PyPy 5.9 支援 Pandas 與 NumPy 了

PyPy 5.9 支援 machine learning 常用的 PandasNumPy 了:「PyPy v5.9 Released, Now Supports Pandas, NumPy」,包括 2.7 與 3.5 的相容版本:

The PyPy team is proud to release both PyPy3.5 v5.9 (a beta-quality interpreter for Python 3.5 syntax) and PyPy2.7 v5.9 (an interpreter supporting Python 2.7 syntax).

對於使用 Python 大量計算的人來說可以進場測試了 XD