Home » Posts tagged "fast"

Amazon Aurora 支援快速複製

Amazon Aurora 宣佈支援快速複製:「Amazon Aurora Fast Database Cloning」。

對於 2TB 的資料大約五分鐘就完成了:

This means my 2TB snapshot restore job that used to take an hour is now ready in about 5 minutes – and most of that time is spent provisioning a new RDS instance.

主要是得力於後端 storage 的部份可以實做 copy-on-write 架構:

By taking advantage of Aurora’s underlying distributed storage engine you’re able to quickly and cheaply create a copy-on-write clone of your database.

可以快速複製就可以很快的驗證一些事情,像是可以直接測試 ALTER TABLE 需要的時間,或是事前演練...

用 Go 寫的 Badger

Dgraph 在推銷自家發展出來的 Badger:「Introducing Badger: A fast key-value store written natively in Go」。

標靶是 RocksDB,號稱比 RocksDB 快好幾倍:

Based on benchmarks, Badger is at least 3.5x faster than RocksDB when doing random reads. For value sizes between 128B to 16KB, data loading is 0.86x - 14x faster compared to RocksDB, with Badger gaining significant ground as value size increases. On the flip side, Badger is currently slower for range key-value iteration, but that has a lot of room for optimization.

不過我覺得有些重要的功能在 Badger 不提供,這比起來有種橘子比蘋果的感覺... 像是 RocksDB 提供了 Transaction,而 Badger 則是直接講明他們不打算支援 Transaction:

Keep it simple, stupid. No support for transactions, versioning or snapshots -- anything that can be done outside of the store should be done outside.

各家 glob 的效能...

在「Glob Matching Can Be Simple And Fast Too」這邊看到在分析 (a.*)nb 這樣的 pattern 的效能 (像是 a.*a.*a.*b 這樣的東西),第一波先測 shell,結果發現有趣的現象:

那個 csh 是怎麼了 XDDD

Looking at the source code, it doesn’t attempt to perform glob expansion itself. Instead it calls the C library implementation glob(3), which runs in linear time, at least on this Linux system. So maybe we should look at programming language implementations too.


各家實做方式不一樣 XD

然後文章裡有提到這個方式是之前蠻常見的 DoS 技巧,用很少的資源就可以吃光你的 CPU resource... 另外也提到了可能的解法,就是限制星號的數量:

At the very least, most FTP servers should probably reject glob patterns with more than, say, 3 stars.


號稱目前最快的 Terminal 軟體 (因為用 GPU 加速)

看到「Announcing Alacritty, a GPU-accelerated terminal emulator」這個用 GPU 加速 rendering 的 terminal emulator:「Alacritty」。

Alacritty is a blazing fast, GPU accelerated terminal emulator. It’s written in Rust and uses OpenGL for rendering to be the fastest terminal emulator available.

全螢幕全文字的情況下可以到 500 fps:

Alacritty’s renderer is capable of doing ~500 FPS with a large screen full of text. This is made possible by efficient OpenGL usage.

現在支援 Linux 與 macOS,不過要自己編,會比較麻煩一點:

Alacritty currently supports macOS and Linux, and Windows support is planned before the 1.0 release.

nginx 的 TCP Fast Open

在「Enabling TCP Fast Open for NGINX on CentOS 7」這邊看到 nginxTCP Fast Open (TFO,RFC 7413) 的支援早在 1.5.8 就有了,而 Linux Kernel 也是 3.7 之後就全面支援了。

TCP Fast Open 利用第一次連線後產生的 TCP cookie,在第二次連線時可以在 3-way handshake 的過程就開始傳輸,藉此大幅降低 latency。

設定方法不難,先在 kernel 設定 net.ipv4.tcp_fastopen=3,再加上 fastopen=number 就可以了,像是這樣:

listen 80 fastopen=256

不過目前 NGINX Mainline 上的版本好像沒有編進去,暫時沒辦法測...

OpenSSL 的 ECDH 中,224 bits 速度比 160/192 bits 快的原因

openssl speed ecdh 的時候發現很特別的現象:

Doing 160 bit  ecdh's for 10s: 40865 160-bit ECDH ops in 9.99s
Doing 192 bit  ecdh's for 10s: 34169 192-bit ECDH ops in 9.99s
Doing 224 bit  ecdh's for 10s: 60980 224-bit ECDH ops in 9.99s
Doing 256 bit  ecdh's for 10s: 34298 256-bit ECDH ops in 10.00s
Doing 384 bit  ecdh's for 10s: 9602 384-bit ECDH ops in 10.00s
Doing 521 bit  ecdh's for 10s: 9127 521-bit ECDH ops in 9.99s

原因是 Google 這篇論文的貢獻:「Fast Elliptic Curve Cryptography in OpenSSL」,開頭就提到:

We present a 64-bit optimized implementation of the NIST and SECG-standardized elliptic curve P-224.


full TLS handshakes using a 1024-bit RSA certificate and ephemeral Elliptic Curve Diffie-Hellman key exchange over P-224 now run at twice the speed of standard OpenSSL, while atomic elliptic curve operations are up to 4 times faster.

OpenSSLCHANGES 也可以看到對應的修改,不只是 NIST-P224 有被改善,其他的 NIST-P256 與 NIST-P521 也都有被改善:

Add optional 64-bit optimized implementations of elliptic curves NIST-P224, NIST-P256, NIST-P521, with constant-time single point multiplication on typical inputs.


Fast Inverse Square Root 演算法...

中文稱為「平方根倒數速演算法」,英文則是「Fast Inverse Square Root」。

好像是在 Twitter 還是 Facebook 上看到的 (還是是在其他管道?),仔細看中文版維基百科條目,發現中文版的資料相當完整了 (看了一下歷史記錄,是去年 2012 年 6 月的時候從英文版翻出來的)。

當時很有名的 magic hack,比查表法快:


然後還比 CPU 指令集快 XD

由於演算法所生成的用於輸入牛頓法的首次近似值已經相當精確,此演算法所得近似值的精度已可接受,而若使用與《雷神之鎚III競技場》同為1999年發行的Pentium III中的SSE指令rsqrtss計算,則計算平方根倒數的收斂速度更慢,精度也更低。

Update:請參考 comment,看起來中文版有誤譯...

我本來以為我之前寫過,找了找沒翻到... 補記錄下來 :p