Intel 的 RDRAND 爆炸...

在正妹 wens 的 Facebook 上看到的,IntelRDRAND 因為有安全漏洞 (CrossTalk/SRBDS),新推出的修正使得 RDRAND 只有原來的 3% 效能:

從危機百科上看,大概是因為這個指令集有 compliance 的要求,所以這個安全性漏洞必須在安全性上修到乾淨,所以使用了暴力鎖硬解,造成效能掉這麼多:

The random number generator is compliant with security and cryptographic standards such as NIST SP 800-90A, FIPS 140-2, and ANSI X9.82.

不過畢竟這個指令不是常常被使用,一般使用者的影響應該是還好:

As explained in the earlier article, mitigating CrossTalk involves locking the entire memory bus before updating the staging buffer and unlocking it after the contents have been cleared. This locking and serialization now involved for those instructions is very brutal on the performance, but thankfully most real-world workloads shouldn't be making too much use of these instructions.

另外這個漏洞早在 2018 九月的時候就通報 Intel 提了,但最後花了超過一年半時間才更新,這算是當初在提 Bug Bounty 制度時可能的缺點,在這次算是比較明顯:

We disclosed an initial PoC (Proof-Of-Concept) showing the leakage of staging buffer content in September 2018, followed by a PoC implementing cross-core RDRAND/RDSEED leakage in July 2019. Following our reports, Intel acknowledged the vulnerabilities, rewarded CrossTalk with the Intel Bug Bounty (Side Channel) Program, and attributed the disclosure to our team with no other independent finders. Intel also requested an embargo until May 2020 (later extended), due to the difficulty of implementing a fix for the cross-core vulnerabilities identified in this paper.

回到原來的 bug,主要還是 Intel 架構上的問題造成大家打得很愉快,現在 Intel 這邊的架構對於資安研究員仍然是個大家熱愛的地方... (因為用的使用者太多)

Linux Kernel 5.7 釋出...

在「The New Features Of The Linux 5.7 Kernel: Tiger Lake Graphics Stable, New exFAT, Zstd F2FS, Performance」這邊有列出重點來。

其中把過熱保護機制也一起考慮進來,這樣可以避免過熱被強制降速而反而變非常慢:

Thermal pressure tracking for systems that are thermally overloaded for better task placement on CPU cores running hot.

另外一個是把 exFAT 驅動換成由 Samsung 維護的版本,照其他文章的說明,這個版本比較穩定...

The new exFAT file-system driver that replaces the exFAT driver in the staging area that had been around for a few releases. This new exFAT driver is in much better shape and actively maintained by Samsung.

主要還是過熱保護那段還蠻值得期待,不然就是要硬上水冷壓,避免遇到溫度牆...

Amazon EC2 推出 AMD 版本的 C5a 主機

前幾天 AWS 宣佈了使用 AMD CPU 的 c5a 主機:「New – Amazon EC2 C5a Instances Powered By 2nd Gen AMD EPYC™ Processors」。

定價上與其他 AWS 上 AMD 的機器都類似 (m5ar5a 以及 t3a),大約都低 10% 左右:

Today, I am excited to announce the general availability of compute-optimized C5a instances featuring 2nd Gen AMD EPYC™ processors, running at frequencies up to 3.3 GHz. C5a instances are variants of Amazon EC2’s compute-optimized (C5) instance family and provide high performance processing at 10% lower cost over comparable instances.

以這兩年 AMD 在 CPU 這塊碾壓 Intel 的情況來說 (AMD YES!),目前 AMD CPU 上唯一的缺點應該是單核的時脈比較低,這在捉機上的領域裡面來說,遇到一些沒有做好多核最佳化的遊戲時 Intel 的表現會比較好,其他的情況其實都可以看出 AMD 幹爆 Intel。

如果你在 AWS 上跑的應用程式也有類似需要吃單核效能,無法充分使用多核環境時,還是可以回頭看一下 c5 系列,頻率會高一些:

C5 and C5d instances feature either the 1st or 2nd generation Intel Xeon Platinum 8000 series processor (Skylake-SP or Cascade Lake) with a sustained all core Turbo CPU clock speed of up to 3.6 GHz. The new C5 and C5d 12xlarge, 24xlarge, and metal instance sizes feature the 2nd generation Intel Xeon Scalable Processors (Cascade Lake) with a sustained all-core Turbo CPU frequency of 3.6GHz.

說到這個,Linus 前陣子在發表 5.7-rc7 的時候也說換了 AMD 的 3970x,整個處理 kernel 編譯的速度是用倍數在改善:「Linux 5.7-rc7」:

In fact, the biggest excitement this week for me was just that I upgraded my main machine, and for the first time in about 15 years, my desktop isn't Intel-based. No, I didn't switch to ARM yet, but I'm now rocking an AMD Threadripper 3970x. My 'allmodconfig' test builds are now three times faster than they used to be, which doesn't matter so much right now during the calming down period, but I will most definitely notice the upgrade during the next merge window.

回到 AWS 這邊... 目前有提供的區域來說,歐美比較大的區域都在首發清單,台灣這邊比較常用的東京還沒有看到,不過公司在用的新加坡上了:

You can launch C5a instances today in eight sizes in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Sydney), and Asia Pacific (Singapore) Regions in On-Demand, Spot, and Reserved Instance or as part of a Savings Plan.

如果沒有什麼特別成本壓力的人,可以等一陣子看看有沒有其他人丟效能測試出來,看看 CPU 的計算能力是不是可以無痛換過去... (這邊是相較於 ARM 那邊)

AMD Ryzen Threadripper 3990X 在 Windows 上的效能

John Carmack 注意到在 AMD Ryzen Threadripper 3990X 上因為 Windows 的 group limit 限制而造成效能問題:

但這點可以透過打散到兩個 group 改善 (workaround) 而提昇速度:

然後順便看了一下目前 CPU Benchmark 網站上對於高階 CPU 的跑分數據「PassMark - CPU Mark High End CPUs)」,可以看到 AMD 最近真是香噴噴的,用 3950X (16C/32T,105W) 殺 Intel 目前最高分的 W-3275M (28C/56T,205W),然後那個價差:

Intel 的 14nm 牙膏繼續擠...

Amazon RDS 推出了 Connection Pool 的產品

Amazon RDS 推出了 Connection Pool 的產品,叫做 Amazon RDS Proxy:「Introducing Amazon RDS Proxy (Preview)」。

目前支援 MySQL (包括了傳統的與 Aurora 版本的):

Amazon RDS Proxy supports Amazon RDS for MySQL and Amazon Aurora with MySQL compatibility, with support for additional RDS database engines coming soon.

定價策略看起來是依照後端資料庫的 vCPU 計算:

Pricing is simple and predictable: you pay per vCPU of the database instance for which the proxy is enabled.

翻了一下價錢頁是 USD$0.015/vCPU (用 us-east-1 的資料),而如果是 t2 系列的機器,最低是以 2 vCPUs 計算,不是照使用比例算:

RDS Proxy pricing correlates to the number of vCPUs of the database instance for which it is enabled, with a minimum charge for 2 vCPUs.

這樣一個 vCPU 一個月大約要 USD$21.6,算起來頗貴的... 如果 SLA 允許的話,用基本的方式 failover 也許就 ok 了...

如果 SLA 真的要追求到這麼高的話,可以在這些區域測試:

Amazon RDS Proxy is available in preview for RDS MySQL and Aurora MySQL in US East (N. Virginia), US East (Ohio), US West (Oregon), EU West (Ireland), and Asia Pacific (Tokyo) regions. Support for RDS PostgreSQL and Aurora PostgreSQL is coming soon.

Amazon EC2 推出了新一代的 ARM 系統

Amazon EC2 推出了新一代的 ARM 系統:「Coming Soon – Graviton2-Powered General Purpose, Compute-Optimized, & Memory-Optimized EC2 Instances」。

目前的 a1 系列最大到 32GB RAM,這次推出來的算是比較大台的機器,而且與 x86-64 架構相同,分化成 m/c/r 系列了:

  • General Purpose (M6g and M6gd) – 1-64 vCPUs and up to 256 GiB of memory.
  • Compute-Optimized (C6g and C6gd) – 1-64 vCPUs and up to 128 GiB of memory.
  • Memory-Optimized (R6g and R6gd) – 1-64 vCPUs and up to 512 GiB of memory.

預定是 2020 年推出:

I will have more information to share with you in 2020.

不過如果目前想要玩的話,可以找 AWS 申請 m6g 的機器先測試看看:

M6g Preview
We are now running a preview of the M6g instances for testing on non-production workloads; if you are interested, please contact us.

價錢好像也還沒出來,先放著等新消息好了...

Linux 上 Intel CPU 的安全性修正與效能的影響

Hacker News Daily 上看到在講 Intel CPU 因為各種安全性問題,而需要在 Linux Kernel 上修正,所產生的效能問題:「HOWTO make Linux run blazing fast (again) on Intel CPUs」。

這一系列的子彈也飛得夠久了 (雖然還是一直有其他的小子彈在飛),所以回過頭來看一下目前的情況。

這邊主要的測試是針對 mitigations=off 與 SMT 的啟用兩個項目在測 (SMT 在 Intel 上叫做 Hyper-threading),可以看到這兩份測試結果,目前的 mitigation 對效能的影響其實已經逐漸降到可以接受的程度 (小於 5%),但關閉 SMT 造成的效能影響大約都在 20%~30%:

但是開啟 SMT 基本上是個大坑,如果有關注大家在挖洞的對象,可以看到一堆 Intel CPU 上專屬的安全性問題都跟 SMT 有關...

剛好岔個題聊一下,先前弄了一顆 AMDRyzen 7 3700X 在用 (也是跑 Linux 桌機),才感受到現在的網頁真的很吃 CPU,開個網頁版的 SlackOffice 365 的速度比原來的老機器快了好多,差點想要把家裡的桌機也換掉...

Google Chrome 對 CPU bug 的 patch

既然有方向了,後續應該會有人去找底層的問題...

先是在 Hacker News 上看到「Speculative fix to crashes from a CPU bug」這個猜測性的修正,這是因為他們發現在 IntelGemini Lake 低功耗晶片組上會發生很詭異的 crash:

For the last few months Chrome has been seeing many "impossible" crashes on Intel Gemini Lake, family 6 model 122 stepping 1 CPUs. These crashes only happen with 64-bit Chrome and only happen in the prologue of two functions. The crashes come and go across different Chrome versions.

然後依照 crash log 猜測跟 alignment 有關,所以決定用 gcc/clang 都有支援的 __attribute__ 強制設定 alignment 來避開,但看起來手上沒有可以重製的環境,所以只能先把實做丟上來...

EC2 要從 Instance 數量限制改成 vCPU 數量限制

這算是 AWS 的保護機制,在 Amazon EC2 上能開的機器數量都是有限制的。

打算要用新的 vCPU 數量限制取代舊的 Instance 數量限制:「Using new vCPU-based On-Demand Instance limits with Amazon EC2」,然後現在可以先加入:「vCPU-based On-Demand Instance Limits are Now Available in Amazon EC2」。

這次改善的問題是,以往 m5.largem5.xlarge 是兩個不同的限制,所以用起來會比較卡,現在則改成用 vCPU 來管理。

這次的架構是改成,一般性的機器會有一個 vCPU 數量限制,其他不同特性的各自有自己的 vCPU 數量限制:

In addition to now measuring usage in number of vCPUs, there will only be five different On-Demand Instance limits—one limit that governs the usage of standard instance families such as A, C, D, H, I, M, R, T, and Z, and one limit per accelerated instance family for FPGA (F), graphic-intensive (G), general purpose GPU (P), and special memory optimized (X) instances.

9/24 可以先手動加入,會拿你現在的量會換算過去,然後 10/24 會全部都轉過去:

During a transition period from September 24, 2019, through October 24, 2019, you can opt in to receive vCPU-based instance limits. When you opt in, EC2 automatically computes your new limits, giving you access to launch at least the same number of instances (if not more) than you do currently. Beginning October 24, 2019, all accounts will switch to vCPU-based instance limits, and the current count-based instance limits will no longer be supported. Although the switchover will not impact your ability to launch EC2 instances, you should familiarize yourself with the new On-Demand Instance limits experience and opt into vCPU limits at a time of your choosing.

應該是會方便一些...

IBM 把 OpenPOWER Foundation 交給 The Linux Foundation

標題雖然是「Big Blue Open Sources Power Chip Instruction Set」,但實質上應該就是 IBMOpenPOWER Foundation 交給 The Linux Foundation

找了一下兩邊的新聞稿,其中 The Linux Foundation 的新聞稿在「The Linux Foundation Announces New Open Hardware Technologies and Collaboration」這邊,但 OpenPOWER 的網站好像從 2018 年年底就沒更新了...

開放硬體最近比較紅的應該是 RISC-VOpenRISC 這些專案?IBM 這一招不知道是怎麼樣...