Amazon EC2 各種虛擬化技術的效能

Brendan Gregg 整理了 Amazon EC2 的各種虛擬化技術以及效能的比較:「AWS EC2 Virtualization 2017」,從他做的這張圖可以看到最新的兩個技術 (編號 7 與 8) 的效能相當好:

主要分成三種虛擬化技術:

  • Virtualized in Software: While this can support an unmodified guest OS, many operations are emulated and slow. Apps may run 2x to 10x slower, or worse.
  • Paravirtualization: The hypervisor provides efficient hypercalls, and the guest OS uses drivers and kernel modifications to call these hypercalls. It's using software and coordination between the hypervisor and guest to improve performance. I'd expect measurable overhead of 10% to 50% (depending on the PV type and workload).
  • Virtualized in Hardware: Hardware support for virtualization, and near bare-metal speeds. I'd expect between 0.1% and 1.5% overhead.

用硬體虛擬化的過程... 然後最後也推出 bare metal 的機器 XD

Amazon EC2 的 C5 改跑 KVM

前幾天提到的「Amazon EC2 的 C5 家族...」講到 Amazon EC2 推出 C5 系列帶來的效能與價錢的提昇,後來在 The Register 上看到「AWS adopts home-brewed KVM as new hypervisor」這邊提到更多東西,而且有些還頗有趣的...

首先是 Amazon EC2 FAQs 裡面就有提到的,C5 跑的是 KVM,而不像以前是以 Xen 為底:

Q. What is the underlying hypervisor on C5 instances?
C5 instances use a new EC2 hypervisor that is based on core KVM technology.

另外一點比較有疑慮,是 The Register 提到 IntelAmazon 都發出新聞稿,說明 C5 是使用客製化版本的 Intel 處理器,不過看完後讀不出這個感覺...:「Intel Xeon Scalable Processors Supercharge Amazon Web Services’ Most Powerful Compute-Optimized Instances」、「AWS Announces Availability of C5 Instances for Amazon EC2」。

不過先不管這個... 本來 AWS 都一直圍繞在 Xen 上,這次切去 KVM 是個很值得關注的決策,可以找機會問問看為什麼要這樣換。

Amazon EC2 上的 gettimeofday 與 clock_gettime 的效能

看到「Two frequently used system calls are ~77% slower on AWS EC2」這篇在講 gettimeofdayclock_gettime 的效能,另外搜資料時發現應該也是作者提問的「gettimeofday() not using vDSO?」這篇。

EC2 比較新的機器上用 tsc 應該是沒問題的 (在 2015 的時候官方就這樣建議了):

it seems tsc support in Xen has improved with version 4.0 and with improved CPU support in Sandy Bridge+ platforms. Modern EC2 machines should be okay with tsc. Check Xen version using dmesg | grep "Xen version". Amazon recommended the tsc clocksource already in re:Invent 2015 (https://www.slideshare.net/AmazonWebServices/cmp402-amazon-ec2-instances-deep-dive). I'm not yet running to production with this, but the situation doesn't seem as bad as implied by packagecloud.

開了一台 t2.micro 看 /sys/devices/system/clocksource/clocksource0/current_clocksource 看起來目前是設成 xen

ubuntu@ip-172-31-22-165:~$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
xen

在「(CMP402) Amazon EC2 Instances Deep Dive」這邊也可以看到一些資料 (page 24 與 page 25):

Linode 將全面使用 KVM

Linode 宣佈從五月開始全面使用 KVM,新的機器將無法選擇 Xen:「KVM Update」。

原有的 Xen 機器還是會繼續跑,但將會集中起來:

Existing Xen-based Linodes will be fine. However, in the near future we will begin to consolidate Xen Linodes onto fewer physical servers, which will mean scheduled migrations with periods of downtime. Don’t worry – if you will be affected, we’ll provide plenty of advance notice when those migrations are planned.

Linode 的 KVM 的確快不少,我猜也跟硬體升級有關吧,畢竟就其他單位看到的數據應該是沒有這麼高的幅度。

Linode 提供 KVM-based hosting 服務

Linode 前幾天宣佈開始提供 KVM-based hosting:「Linode turns 12! Here’s some KVM!」。

使用 KVM 最大的進展是可以跑 Windows 了:(這對產品面來說很重要)

However, we also now support fully virtualized guests – which means you can run alternative operating systems like FreeBSD, BSD, Plan 9, or even Windows – using emulated hardware (PIIX IDE and e1000).

同時圖形化界面的 console 也會釋出:

We’re also working on a graphical console (GISH?) which should be out in the next few weeks.

而現有 Xen 換到 KVM 也包裝得很簡單:

On a Xen Linode’s dashboard, you will see an “Upgrade to KVM” link on the right sidebar. It’s a one-click migration to upgrade your Linode to KVM from there. Essentially, our KVM upgrade means you get a much faster Linode just by clicking a button.

而且打算逐步淘汰掉 Xen:

New customers and new Linodes will, by default, still get Xen. Xen will cease being the default in the next few weeks. Eventually we will transition all Xen Linodes over to KVM, however this is likely to take quite a while. Don’t sweat it.

不過 Linode 還是沒支援更好的網路環境 (i.e. Private LAN),對於要架大型服務的人來說還是有技術障礙 :o

Amazon 的 Xen 安全性更新

AWS 上租一卡車機器的人最近應該都有收到重開機的通知,目前雖然沒有明講編號,但看起來是 10/01 會公開的 XSA-108:「EC2 Maintenance Update」。

不過 Slashdot 上的「Amazon Forced To Reboot EC2 To Patch Bug In Xen」這篇的第一個 comment 很精彩:

It's funny for me to read that Amazon is notifying its users of an impending reboot.

I've been suffering with Azure for over a year now, and the only thing that's constant is rebooting....

My personal favorite Azure feature, is that SQL Azure randomly drops database connections by design.

Let that sink in for a while. You are actually required to program your application to expect failed database calls.

I've never seen such a horrible platform, or a less reliable database server...

這要怎麼說呢... 就使用雲端服務的人,設計上的確要這樣沒錯,但就提供雲端服務的供應商,應該還是要保持 VM 的穩定性吧... XDDD