Amazon EC2 推出 AMD 版本的 C5a 主機

前幾天 AWS 宣佈了使用 AMD CPU 的 c5a 主機:「New – Amazon EC2 C5a Instances Powered By 2nd Gen AMD EPYC™ Processors」。

定價上與其他 AWS 上 AMD 的機器都類似 (m5ar5a 以及 t3a),大約都低 10% 左右:

Today, I am excited to announce the general availability of compute-optimized C5a instances featuring 2nd Gen AMD EPYC™ processors, running at frequencies up to 3.3 GHz. C5a instances are variants of Amazon EC2’s compute-optimized (C5) instance family and provide high performance processing at 10% lower cost over comparable instances.

以這兩年 AMD 在 CPU 這塊碾壓 Intel 的情況來說 (AMD YES!),目前 AMD CPU 上唯一的缺點應該是單核的時脈比較低,這在捉機上的領域裡面來說,遇到一些沒有做好多核最佳化的遊戲時 Intel 的表現會比較好,其他的情況其實都可以看出 AMD 幹爆 Intel。

如果你在 AWS 上跑的應用程式也有類似需要吃單核效能,無法充分使用多核環境時,還是可以回頭看一下 c5 系列,頻率會高一些:

C5 and C5d instances feature either the 1st or 2nd generation Intel Xeon Platinum 8000 series processor (Skylake-SP or Cascade Lake) with a sustained all core Turbo CPU clock speed of up to 3.6 GHz. The new C5 and C5d 12xlarge, 24xlarge, and metal instance sizes feature the 2nd generation Intel Xeon Scalable Processors (Cascade Lake) with a sustained all-core Turbo CPU frequency of 3.6GHz.

說到這個,Linus 前陣子在發表 5.7-rc7 的時候也說換了 AMD 的 3970x,整個處理 kernel 編譯的速度是用倍數在改善:「Linux 5.7-rc7」:

In fact, the biggest excitement this week for me was just that I upgraded my main machine, and for the first time in about 15 years, my desktop isn't Intel-based. No, I didn't switch to ARM yet, but I'm now rocking an AMD Threadripper 3970x. My 'allmodconfig' test builds are now three times faster than they used to be, which doesn't matter so much right now during the calming down period, but I will most definitely notice the upgrade during the next merge window.

回到 AWS 這邊... 目前有提供的區域來說,歐美比較大的區域都在首發清單,台灣這邊比較常用的東京還沒有看到,不過公司在用的新加坡上了:

You can launch C5a instances today in eight sizes in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Sydney), and Asia Pacific (Singapore) Regions in On-Demand, Spot, and Reserved Instance or as part of a Savings Plan.

如果沒有什麼特別成本壓力的人,可以等一陣子看看有沒有其他人丟效能測試出來,看看 CPU 的計算能力是不是可以無痛換過去... (這邊是相較於 ARM 那邊)

AMD Ryzen Threadripper 3990X 在 Windows 上的效能

John Carmack 注意到在 AMD Ryzen Threadripper 3990X 上因為 Windows 的 group limit 限制而造成效能問題:

但這點可以透過打散到兩個 group 改善 (workaround) 而提昇速度:

然後順便看了一下目前 CPU Benchmark 網站上對於高階 CPU 的跑分數據「PassMark - CPU Mark High End CPUs)」,可以看到 AMD 最近真是香噴噴的,用 3950X (16C/32T,105W) 殺 Intel 目前最高分的 W-3275M (28C/56T,205W),然後那個價差:

Intel 的 14nm 牙膏繼續擠...

Amazon RDS 推出了 Connection Pool 的產品

Amazon RDS 推出了 Connection Pool 的產品,叫做 Amazon RDS Proxy:「Introducing Amazon RDS Proxy (Preview)」。

目前支援 MySQL (包括了傳統的與 Aurora 版本的):

Amazon RDS Proxy supports Amazon RDS for MySQL and Amazon Aurora with MySQL compatibility, with support for additional RDS database engines coming soon.

定價策略看起來是依照後端資料庫的 vCPU 計算:

Pricing is simple and predictable: you pay per vCPU of the database instance for which the proxy is enabled.

翻了一下價錢頁是 USD$0.015/vCPU (用 us-east-1 的資料),而如果是 t2 系列的機器,最低是以 2 vCPUs 計算,不是照使用比例算:

RDS Proxy pricing correlates to the number of vCPUs of the database instance for which it is enabled, with a minimum charge for 2 vCPUs.

這樣一個 vCPU 一個月大約要 USD$21.6,算起來頗貴的... 如果 SLA 允許的話,用基本的方式 failover 也許就 ok 了...

如果 SLA 真的要追求到這麼高的話,可以在這些區域測試:

Amazon RDS Proxy is available in preview for RDS MySQL and Aurora MySQL in US East (N. Virginia), US East (Ohio), US West (Oregon), EU West (Ireland), and Asia Pacific (Tokyo) regions. Support for RDS PostgreSQL and Aurora PostgreSQL is coming soon.

Amazon EC2 推出了新一代的 ARM 系統

Amazon EC2 推出了新一代的 ARM 系統:「Coming Soon – Graviton2-Powered General Purpose, Compute-Optimized, & Memory-Optimized EC2 Instances」。

目前的 a1 系列最大到 32GB RAM,這次推出來的算是比較大台的機器,而且與 x86-64 架構相同,分化成 m/c/r 系列了:

  • General Purpose (M6g and M6gd) – 1-64 vCPUs and up to 256 GiB of memory.
  • Compute-Optimized (C6g and C6gd) – 1-64 vCPUs and up to 128 GiB of memory.
  • Memory-Optimized (R6g and R6gd) – 1-64 vCPUs and up to 512 GiB of memory.

預定是 2020 年推出:

I will have more information to share with you in 2020.

不過如果目前想要玩的話,可以找 AWS 申請 m6g 的機器先測試看看:

M6g Preview
We are now running a preview of the M6g instances for testing on non-production workloads; if you are interested, please contact us.

價錢好像也還沒出來,先放著等新消息好了...

Linux 上 Intel CPU 的安全性修正與效能的影響

Hacker News Daily 上看到在講 Intel CPU 因為各種安全性問題,而需要在 Linux Kernel 上修正,所產生的效能問題:「HOWTO make Linux run blazing fast (again) on Intel CPUs」。

這一系列的子彈也飛得夠久了 (雖然還是一直有其他的小子彈在飛),所以回過頭來看一下目前的情況。

這邊主要的測試是針對 mitigations=off 與 SMT 的啟用兩個項目在測 (SMT 在 Intel 上叫做 Hyper-threading),可以看到這兩份測試結果,目前的 mitigation 對效能的影響其實已經逐漸降到可以接受的程度 (小於 5%),但關閉 SMT 造成的效能影響大約都在 20%~30%:

但是開啟 SMT 基本上是個大坑,如果有關注大家在挖洞的對象,可以看到一堆 Intel CPU 上專屬的安全性問題都跟 SMT 有關...

剛好岔個題聊一下,先前弄了一顆 AMDRyzen 7 3700X 在用 (也是跑 Linux 桌機),才感受到現在的網頁真的很吃 CPU,開個網頁版的 SlackOffice 365 的速度比原來的老機器快了好多,差點想要把家裡的桌機也換掉...

Google Chrome 對 CPU bug 的 patch

既然有方向了,後續應該會有人去找底層的問題...

先是在 Hacker News 上看到「Speculative fix to crashes from a CPU bug」這個猜測性的修正,這是因為他們發現在 IntelGemini Lake 低功耗晶片組上會發生很詭異的 crash:

For the last few months Chrome has been seeing many "impossible" crashes on Intel Gemini Lake, family 6 model 122 stepping 1 CPUs. These crashes only happen with 64-bit Chrome and only happen in the prologue of two functions. The crashes come and go across different Chrome versions.

然後依照 crash log 猜測跟 alignment 有關,所以決定用 gcc/clang 都有支援的 __attribute__ 強制設定 alignment 來避開,但看起來手上沒有可以重製的環境,所以只能先把實做丟上來...

EC2 要從 Instance 數量限制改成 vCPU 數量限制

這算是 AWS 的保護機制,在 Amazon EC2 上能開的機器數量都是有限制的。

打算要用新的 vCPU 數量限制取代舊的 Instance 數量限制:「Using new vCPU-based On-Demand Instance limits with Amazon EC2」,然後現在可以先加入:「vCPU-based On-Demand Instance Limits are Now Available in Amazon EC2」。

這次改善的問題是,以往 m5.largem5.xlarge 是兩個不同的限制,所以用起來會比較卡,現在則改成用 vCPU 來管理。

這次的架構是改成,一般性的機器會有一個 vCPU 數量限制,其他不同特性的各自有自己的 vCPU 數量限制:

In addition to now measuring usage in number of vCPUs, there will only be five different On-Demand Instance limits—one limit that governs the usage of standard instance families such as A, C, D, H, I, M, R, T, and Z, and one limit per accelerated instance family for FPGA (F), graphic-intensive (G), general purpose GPU (P), and special memory optimized (X) instances.

9/24 可以先手動加入,會拿你現在的量會換算過去,然後 10/24 會全部都轉過去:

During a transition period from September 24, 2019, through October 24, 2019, you can opt in to receive vCPU-based instance limits. When you opt in, EC2 automatically computes your new limits, giving you access to launch at least the same number of instances (if not more) than you do currently. Beginning October 24, 2019, all accounts will switch to vCPU-based instance limits, and the current count-based instance limits will no longer be supported. Although the switchover will not impact your ability to launch EC2 instances, you should familiarize yourself with the new On-Demand Instance limits experience and opt into vCPU limits at a time of your choosing.

應該是會方便一些...

IBM 把 OpenPOWER Foundation 交給 The Linux Foundation

標題雖然是「Big Blue Open Sources Power Chip Instruction Set」,但實質上應該就是 IBMOpenPOWER Foundation 交給 The Linux Foundation

找了一下兩邊的新聞稿,其中 The Linux Foundation 的新聞稿在「The Linux Foundation Announces New Open Hardware Technologies and Collaboration」這邊,但 OpenPOWER 的網站好像從 2018 年年底就沒更新了...

開放硬體最近比較紅的應該是 RISC-VOpenRISC 這些專案?IBM 這一招不知道是怎麼樣...

重新了解 Amazon EC2 的 T2/T3 Unlimited

剛好跟同事聊到 Amazon EC2 的 T2/T3 Unlimited,因為有些疑惑所以回家查了一下資料,發現以前的理解不夠完整。這是 T2/T3 Unlimited 的說明文件:「Unlimited Mode for Burstable Performance Instances」。

在文件的這塊說明了 T2/T3 Unlimited 模式的計算方式:

When its CPU utilization falls below the baseline, it uses the CPU credits that it earns to pay down the surplus credits that it spent earlier. The ability to earn CPU credits to pay down surplus credits enables Amazon EC2 to average the CPU utilization of an instance over a 24-hour period. If the average CPU usage over a 24-hour period exceeds the baseline, the instance is billed for the additional usage at a flat additional rate per vCPU-hour.

我本來以為是剩下的 CPU credit 不夠時就會被收費,但依照官方文件的說明,是可以用後面賺到的 CPU credit 支付前面使用的 CPU credit,而可跨越的時間區間是 24 小時。

所以有時候會在 AWS web console 上看到 CPU 沒有在用,但是 CPU credit 卻長回不來的情況,是因為這時候還在還之前的債...

用 Machine Learning 改善 Streaming 品質的服務與論文

Hacker News 上看到「Puffer」這個服務,裡面利用了 machine learning algorithm 動態調整 bitrate,以提昇傳輸品質。

測試得到的數據後來被整理起來一起放進論文:「Continual learning improves Internet video streaming」。

在開頭介紹了 Fugu 這個演算法:

We describe Fugu, a continual learning algorithm for bitrate selection in streaming video.

而 Puffer 就是實驗網站:

We evaluate Fugu with Puffer, a public website we built that streams live TV using Fugu and existing algorithms. Over a nine-day period in January 2019, Puffer streamed 8,131 hours of video to 3,719 unique users.

這個站台提供了許多真實的頻道進行測試:

Stream live TV in your browser. There's no charge. You can watch U.S. TV stations affiliated with the NBC, CBS, ABC, PBS, FOX, and Univision networks.

可以看到 Fugu 的結果很好,比起其他提出的方案是全面性的改善:

這邊是用 WebSocket 測試,並且配合了不同的 TCP congestion algorithm,沒有太考慮額外的計算成本...