## Intel 與 AMD 在 RSQRTSS 的不同

RSQRTSS 是計算平方根倒數，也就是計算 $1 / \sqrt{x}$，另外比較特別的是，這個指令不保證正確性，是允許有誤差產生的。

On Intel Skylake I get
out = 3d7ff000, float = 0.062485

On AMD Rome I get
out = 3d7ff800, float = 0.062492


## Cloudflare 的 jpegtran 在 ARM 上面的表現

Cloudflare 花了不少力氣在 ARM 的伺服器上 (可以參考「Cloudflare 用 ARM 當伺服器的進展...」，或是更早的「Cloudflare 測試 ARM 新的伺服器」這篇)，最近在 ARM 上發現 jpegtran 的效能不是太好，花了不少力氣最佳化，發現有意外收穫：「NEON is the new black: fast JPEG optimization on ARM server」。

Ideally we want to have the ARM performing at or above 50% of the Xeon performance per core. This would make sure we have no performance regressions, and net performance gain, since the ARM CPUs have double the core count as our current 2 socket setup.

In this case, however, I was disappointed to discover an almost 4X slowdown.

Not one to despair, I figured out that applying the same optimizations I did for Intel would be trivial. Surely the NEON instructions map neatly to the SSE instructions I used before?

While going over the ARMv8 NEON instruction set, I found several unique instructions, that have no equivalent in SSE.

With the new implementation Centriq outperforms the Xeon at batch reduction for every number of workers. We usually run Polish with four workers, for which Centriq is now 1.3 times faster while also 6.5 times more power efficient.

## 收 Wikimedia (包括維基百科) 的 Recent Changes

EventStreams is built on the w3c standard Server Sent Events (SSE). SSE is simply a streaming HTTP connection with event data in a particular text format. Client libraries, usually called EventSource, assist with building responsive tools, but because SSE is really just HTTP, you can use any HTTP client (even curl!) to consume it.