AWS 上用空間買 IOPS 的故事...

在「A web performance issue」這邊講到 Mozilla 的系統產生效能問題,後續的 trouble shooting 以及解決問題的方案。

這個系統跑在 AWS 上,在一連串確認後發現是 RDS 所使用的 EBS 的 IOPS 滿了:

After reading a lot of documentation about Amazon’s RDS set-up I determined that slow downs in the database were related to IOPS spikes. Amazon gives you 3 IOPS per Gb and with a storage of 1 Terabyte we had 3,000 IOPS as our baseline. The graph below shows that at times we would get above that max baseline.

然後大家對於解法都差不多,因為 Provisioned IOPS 太貴,所以直接加大空間換 IOPS 出來 (因為 General SSD 裡 1 GB 給 3 IOPS):

To increase the IOPS baseline we could either increase the storage size or switch from General SSD to Provisioned IOPS storage. The cost of the different storage type was much higher so we decided to double our storage, thus, doubling our IOPS baseline. You can see in the graph below that we’re constantly above our previous baseline. This change helped Treeherder’s performance a lot.

然後再設警告機制,下次就可以提前再拉昇:

In order to prevent getting into such a state in the future, I also created a CloudWatch alert. We would get alerted if the combined IOPS is greater than 5,700 IOPS for 6 datapoints within 10 minutes.

不過 General SSD 的 IOPS 是沒有 100% 保證的,只有這樣寫:

AWS designs gp2 volumes to deliver 90% of the provisioned performance 99% of the time.

大多數的情況應該是夠用啦...

AWS 的 EBS 預設型態改為 GP2 (SSD)

AWS 宣佈 EBS 的預設型態從 Standard 變成 GP2:「EBS default volume type updated to GP2」。

包括 web console 與 API 的預設值都改成 GP2:

The AWS console defaults to GP2 in all regions. On July 29th the default EBS volume type was updated in thirteen regions from Standard to GP2. Now AWS API calls for volume, image, and instance creation also default to GP2 in all regions.

GP2 是 SSD,所以可以提供比較低的 latency,而另外一個用 GP2 的好處是 i/o 的費用已經含在內了 (Standard 會另外收取費用),對於成本估算會比較簡單一些,尤其是 i/o 量比較大的時候。

Amazon EBS (gp2) 提昇效能...

AWS 宣佈提昇了 Amazon EBS (gp2) 的效能:「Amazon EBS Increases Performance of General Purpose SSD (gp2) Volumes」。

本來上限是 10k IOPS,現在提升到 16k IOPS 了。另外最大傳輸速度也從 160 MB/sec 提升到 250 MB/sec:

Today we are announcing a 60% improvement in performance of General Purpose SSD (gp2) Volumes from 10,000 IOPS to 16,000 IOPS and from 160 MB/s to 250 MB/s of throughput per volume.

應該還是維持 3 IOPS per GB 的設計,但這對於想要用 gp2 堆效能的人來說算是好消息,可以用更大的空間堆出更多 IOPS 了... (像是「Percona 的人接受 AWS 的建議,重新測試了 Percona XtraDB Cluster 在 gp2 上的效能...」這篇)

EBS 可以動態擴充 Magnetic Volume 的大小了...

AWS 宣佈可以動態擴充 EBS (Magnetic Volume) 的大小了:「Amazon EBS Extends Elastic Volumes to Support EBS Magnetic (Standard) Volume Type」。EBS (Magnetic Volume) 被歸類到前個世代的產品,意外的把這個功能支援了前個世代的產品...

前世代的 Magnetic Volume 與 Throughput Optimized HDD (st1) & Cold HDD (sc1) 來比較起來貴了一些,但可以當作開機空間,另外低消只有 1GB (而 st1sc1 是 500GB),不過如果與 General Purpose SSD (gp2) 比的話,只要 I/O 有點量,整體的價錢就會超過 gp2,目前看起來還是用 gp2 會好一些...

Amazon EC2 補產品線:M5 也有 NVMe 的 SSD local disk 可以用了

Amazon EC2 對 M5 instance 推出了帶 NVMe SSD 的版本:「EC2 Instance Update – M5 Instances with Local NVMe Storage (M5d)」。先推出歐美的區域:

M5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent M5 instances.

算是 M3 系列的後繼產品?對應到四年前推出的 m3.medium 是 1 vCPU、3.75GB RAM 以及 8GB SSD (不過 m3.large 就變成 2 vCPU、7.5GB RAM 與 32GB SSD,硬碟的部份多了不少)。

這樣 M5 就更全面了...

Amazon EC2 推出 C5d instance:對 CPU 與 Disk 最佳化

AWS 推出了 C5d instance,同時針對 CPU 與 Disk 最佳化的機種:「EC2 Instance Update – C5 Instances with Local NVMe Storage (C5d)」。可以看出是在本地端掛上 NVMe 的方式在租用,價錢也不算高:

以帶了一個 local NVMe 的規劃來看,像是補 c3 系列產品線的感覺:

然後不是每個有 c5 的區域都會有 c5d,目前只有五區有:

C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.

AWS 提昇了 Amazon EBS 能提供的效能上限

AWS 宣佈 Amazon EBS 可以提供的效能往上提高了 (這邊講的是 Provisioned IOPS SSD,代號 io1):「Amazon EBS Improves Performance for io1 Volumes」。

單一 volume 的 IOPS 從 20K 變成 32K,thoughput 從 320MB/sec 變成 500MB/sec:

Today we are announcing an improvement in performance of Provisioned IOPS SSD (io1) Volumes from 20,000 IOPS to 32,000 IOPS and from 320 MB/s to 500 MB/s of throughput per volume.

應該是科技的進步帶動的 XD

Amazon RDS 支援更大的硬碟空間與更多的 IOPS

Amazon RDS 的升級:「Amazon RDS Now Supports Database Storage Size up to 16TB and Faster Scaling for MySQL, MariaDB, Oracle, and PostgreSQL Engines」。

空間上限從 6TB 變成 16TB,而且可以無痛升。另外 IOPS 上限從 30K 變成 40K:

Starting today, you can create Amazon RDS database instances for MySQL, MariaDB, Oracle, and PostgreSQL database engines with up to 16TB of storage. Existing database instances can also be scaled up to 16TB storage without any downtime.

The new storage limit is an increase from 6TB and is supported for Provisioned IOPS and General Purpose SSD storage types. You can also provision up to 40,000 IOPS for Provisioned IOPS storage volumes, an increase from 30,000 IOPS.

不過隔壁的 Amazon Aurora 還是大很多啊 (64TB),而且實際上不用管劃多大,他會自己長大:

Q: What are the minimum and maximum storage limits of an Amazon Aurora database?

The minimum storage is 10GB. Based on your database usage, your Amazon Aurora storage will automatically grow, up to 64 TB, in 10GB increments with no impact to database performance. There is no need to provision storage in advance.

Linode 要推出「Linode Block Storage」了...

從「Linode Block Storage (Fremont beta)」這邊可以看到 Linode 推出 Block Storage 了,是 SSD-based,跟 Amazon EBS 的 gp2 也是 SSD-based 相同。

計價方式,價錢也相同,沒有 I/O fee:

They're affordable - $0.10 per GB (free during the beta) and no usage fees.

目前能從 1GB 開到 1TB:

How big of a Volume can I create?
Between 1 GB and 1024 GB for now. After the beta, the max volume size may be larger.

單台可以掛 8 個:

How many Volumes can I attach to a Linode at the same time?
Up to 8.

然後 2018 開始收費:

The beta is free through 2017. January 1, 2018 the meter starts running.

有了 Block Storage 後有些事情就比較好搭出來了,也不會受限於 local disk 的空間大小。

Linux 下 RAID1 的 SSD 會有讀取不平均問題

在「Unbalanced reads from SSDs in software RAID mirrors in Linux」這邊看到作者看 S.M.A.R.T. 數據時發現兩顆 SSD 硬碟組成的 RAID1 有很明顯的讀取不平均的問題:

242 Total_LBAs_Read [...] 16838224623
242 Total_LBAs_Read [...] 1698394290

原因是因為 Linux 對 RAID1 的 SSD 有不一樣的演算法:

The current state of RAID1 read balancing is kind of complex, but the important thing here in all kernels since 2012 is that if you have SSDs and at least one disk is idle, the first idle disk will be chosen.

2016 時演算法就更激進了,變成非 SSD 會:

In kernels with the late 2016 change, this widens to if at least one disk is idle, the first idle disk will be chosen, even if all mirrors are HDs.

加上 SSD 很快,這造成 loading 幾乎都在第一顆上... 這對 SSD 應該是還好啦 (理論上 SSD 的讀取不傷壽命),不過還是有點怪就是了。