Backblaze 的 2019 年度硬碟報告

Backblaze 丟出去年的報告了:「Backblaze Hard Drive Stats for 2019」。

WD/HGST 的還是最耐用,再來是 Toshiba 的,最後是 Seagate 的。

不過有一些硬碟沒有列到表上,像是「Seagate 16 TB Drives」這組因為 2019Q4 才剛裝上去,所以才 1440 drive days,因此還沒到門檻所以沒放進報告,但就 Backblaze 測試起來看起來是個好的開始:

In Q4 2019 we started qualifying Seagate 16 TB drives, model: ST16000NM001G. As of the end of Q4 we had 40 (forty) drives in operation, with a total of 1,440 drive days—well below our 5,000 drive day threshold for Q4, so they didn’t make the 2019 chart. There have been 0 (zero) failures through Q4, making the AFR 0%, a good start for any drive. Assuming they continue to pass our drive qualification process, they will be used in the 12 TB migration project and to add capacity as needed in 2020.

再來是把 2017/2018/2019 擺在一起看:

馬上可以看到的是 AFR 上升了不少,一個是因為 8TB 系列的硬碟進入中年期,另外是 Seagate 12TB 硬碟的問題:

The total AFR for 2019 rose significantly in 2019. About 75% of the different drive models experienced a rise in AFR from 2018 to 2019. There are two primary drivers behind this rise. First, the 8 TB drives as a group seem to be having a mid-life crisis as they get older, with each model exhibiting their highest failure rates recorded. While none of the rates is cause for worry, they contribute roughly one fourth (1/4) of the drive days to the total, so any rise in their failure rate will affect the total. The second factor is the Seagate 12 TB drives, this issue is being aggressively addressed by the 12 TB migration project reported on previously.

所以大原則還是跟以前差不多,沒有時間特別研究的話就先往 WD/HGST 這邊找...

Backblaze 採購硬碟的策略

在「How Backblaze Buys Hard Drives」這篇裡面提到了 Backblaze 採購硬碟的策略,可以看到完全都是偏成本走向,所以裡面的策略一般個人用不太到,一般企業也不應該照抄,但拿來看看還蠻有趣的...

像是因為硬碟太多,所以硬碟的使用電量是他們在評估成本時蠻重要的一環,這點在一般的情境下不太會考慮到:

Power draw is a very important metric for us and the high speed enterprise drives are expensive in terms of power cost. We now total around 1.5 megawatts in power consumption in our centers, and I can tell you that every watt matters for reducing costs.

另外也提到了 SMR 硬碟的特性,在單位成本雖然有比較高的容量,但導致架構面需要配合 (cache),而也會有工程端的成本提昇,所以不是很愛:

SMR would give us a 10-15% capacity-to-dollar boost, but it also requires host-level management of sequential data writing. Additionally, the new archive type of drives require a flash-based caching layer. Both of these requirements would mean significant increases in engineering resources to support and thereby even more investment. So all-in-all, SMR isn’t cost-effective in our system.

成本面上,他們觀察到的現象是每季會降 5%~10%:

Ideally, I can achieve a 5-10% cost reduction per terabyte per quarter, which is a number based on historical price trends and our performance for the past 10 years.

另外提到了用 SAS controller 可以接多個 SATA 硬碟的事情 (雖然還是成本考量),但這塊也蠻有趣的:

Longer term, one thing we’re looking toward is phasing out SATA controller/port multiplier combo. This might be more technical than some of our readers want to go, but: SAS controllers are a more commonly used method in dense storage servers. Using SATA drives with SAS controllers can provide as much as a 2x improvement in system throughput vs SATA, which is important to me, even though serial ATA (SATA) port multipliers are slightly less expensive. When we started our Storage Pod construction, using SATA controller/port multiplier combo was a great way to keep costs down. But since then, the cost for using SAS controllers and backplanes has come down significantly.

家裡電腦裝 Ubuntu 18.04

上個禮拜四家裡的桌機開不了機,找了一天發現是系統的 SSD 掛掉了,就買了張 M.2 SSD,然後計畫順便把本來的 Ubuntu 16.04 升級到 Ubuntu 18.04,但 Ubuntu 18.04 把預設的界面從 Unity 換成 GNOME (然後披上 Unity 的皮),加上前陣子系統從 Intel 平台換到 AMD,整個狀況變得超混亂之後,就變成一連串踩地雷的過程...

最一開始是 UEFI + LUKS 的安裝問題,本來想裝到 M.2 SSD 上面,但 Ubuntu 18.04 的 grub-install 就是硬寫到 /dev/sda 不能改:「“Unable to install GRUB in /dev/sda” when installing GRUB」,照著這篇的 workaround 用還是不行,最後放棄,直接生一顆 SATA SSD 接到 SATA Port 1,把 M.2 當作資料碟。

硬體相關的問題:

軟體相關的問題:

  • 目前不支援從 GUI 設定 PPPoE 的網路 (沃槽),幾種方式裡面我推薦用 pppoeconf 設定會比較好,然後可以改 /etc/ppp/options 加上 IPv6 的設定。
  • 本來想裝 gnome-shell-extension-system-monitor 觀察系統狀態,但會造成系統超級卡,關掉後就變成普通的卡 (後來就找到 Intel I211-AT 的那個問題了)。

現在至少是堪用的程度了,接下來就是不斷的補各種設定...

AWS 上用空間買 IOPS 的故事...

在「A web performance issue」這邊講到 Mozilla 的系統產生效能問題,後續的 trouble shooting 以及解決問題的方案。

這個系統跑在 AWS 上,在一連串確認後發現是 RDS 所使用的 EBS 的 IOPS 滿了:

After reading a lot of documentation about Amazon’s RDS set-up I determined that slow downs in the database were related to IOPS spikes. Amazon gives you 3 IOPS per Gb and with a storage of 1 Terabyte we had 3,000 IOPS as our baseline. The graph below shows that at times we would get above that max baseline.

然後大家對於解法都差不多,因為 Provisioned IOPS 太貴,所以直接加大空間換 IOPS 出來 (因為 General SSD 裡 1 GB 給 3 IOPS):

To increase the IOPS baseline we could either increase the storage size or switch from General SSD to Provisioned IOPS storage. The cost of the different storage type was much higher so we decided to double our storage, thus, doubling our IOPS baseline. You can see in the graph below that we’re constantly above our previous baseline. This change helped Treeherder’s performance a lot.

然後再設警告機制,下次就可以提前再拉昇:

In order to prevent getting into such a state in the future, I also created a CloudWatch alert. We would get alerted if the combined IOPS is greater than 5,700 IOPS for 6 datapoints within 10 minutes.

不過 General SSD 的 IOPS 是沒有 100% 保證的,只有這樣寫:

AWS designs gp2 volumes to deliver 90% of the provisioned performance 99% of the time.

大多數的情況應該是夠用啦...

AWS 的 EBS 預設型態改為 GP2 (SSD)

AWS 宣佈 EBS 的預設型態從 Standard 變成 GP2:「EBS default volume type updated to GP2」。

包括 web console 與 API 的預設值都改成 GP2:

The AWS console defaults to GP2 in all regions. On July 29th the default EBS volume type was updated in thirteen regions from Standard to GP2. Now AWS API calls for volume, image, and instance creation also default to GP2 in all regions.

GP2 是 SSD,所以可以提供比較低的 latency,而另外一個用 GP2 的好處是 i/o 的費用已經含在內了 (Standard 會另外收取費用),對於成本估算會比較簡單一些,尤其是 i/o 量比較大的時候。

RDS 支援 Storage Auto Scaling

Amazon RDS 推出了 Storage Auto Scaling:「Amazon RDS now supports Storage Auto Scaling」。

看起來傳統 RDBMS 類的都支援 (也就是非 Aurora 的這些):

Starting today, Amazon RDS for MariaDB, Amazon RDS for MySQL, Amazon RDS for PostgreSQL, Amazon RDS for SQL Server and Amazon RDS for Oracle support RDS Storage Auto Scaling.

仔細看了一下新聞稿,裡面都只有提到 scale up,沒有提到 scale down,這個功能應該是只會提昇不會下降,所以要注意突然用很多空間,再砍掉後的問題:

RDS Storage Auto Scaling automatically scales storage capacity in response to growing database workloads, with zero downtime.

RDS Storage Auto Scaling continuously monitors actual storage consumption, and scales capacity up automatically when actual utilization approaches provisioned storage capacity.

除了香港外的所有商業區域都提供:

RDS Storage Auto Scaling is available in all commercial AWS regions except in Asia Pacific (Hong Kong) and AWS GovCloud.

AWS 推出更便宜的儲存方案 Glacier Deep Archive

AWS 推出的這個方案價錢又更低了:「New Amazon S3 Storage Class – Glacier Deep Archive」。

在這之前在 us-east-1S3 最低的方案是 Glacier Storage,單價是 USD$0.004/GB (也就是 $4/TB)。

而這次推出的 Glacier Deep Archive Storage 在同一區則是直接到 USD$0.00099/GB ($0.99/TB),大約是 1/4 的價錢。

Glacier Deep Archive 在取得時 first byte 的保證時間是 12 小時,另外最低消費是 180 天:

Retrieval time within 12 hours

先前就有的 Glacier Storage 則是可以在取用時設定取得的 pattern (會影響 first byte 的時間),而最低消費是 90 天:

Configurable retrieval times, from minutes to hours

Pricing for each of these metrics is determined by the speed at which data is requested based on three options. "Expedited" queries <250 MB are typically returned in 1-5 minutes. "Standard" queries are typically returned in 3-5 hours. "Bulk" queries are typically returned in 5-12 hours.

多一個更便宜的選擇可以用。

Amazon EBS (gp2) 提昇效能...

AWS 宣佈提昇了 Amazon EBS (gp2) 的效能:「Amazon EBS Increases Performance of General Purpose SSD (gp2) Volumes」。

本來上限是 10k IOPS,現在提升到 16k IOPS 了。另外最大傳輸速度也從 160 MB/sec 提升到 250 MB/sec:

Today we are announcing a 60% improvement in performance of General Purpose SSD (gp2) Volumes from 10,000 IOPS to 16,000 IOPS and from 160 MB/s to 250 MB/s of throughput per volume.

應該還是維持 3 IOPS per GB 的設計,但這對於想要用 gp2 堆效能的人來說算是好消息,可以用更大的空間堆出更多 IOPS 了... (像是「Percona 的人接受 AWS 的建議,重新測試了 Percona XtraDB Cluster 在 gp2 上的效能...」這篇)

Backblaze 在 2018 Q2 的硬碟故障率報告

Backblaze 照慣例發表了 2018 Q2 的硬碟狀況:「Hard Drive Stats for Q2 2018」。可以看出來他們基本上只用 SeagateHGST 了,其他的應該是嘗試性質:

第二張因為把信心區間標出來,會更清楚:

WD 差了一大截,所以現在消費級的硬碟會選 Seagate 或是 HGST?

EBS 可以動態擴充 Magnetic Volume 的大小了...

AWS 宣佈可以動態擴充 EBS (Magnetic Volume) 的大小了:「Amazon EBS Extends Elastic Volumes to Support EBS Magnetic (Standard) Volume Type」。EBS (Magnetic Volume) 被歸類到前個世代的產品,意外的把這個功能支援了前個世代的產品...

前世代的 Magnetic Volume 與 Throughput Optimized HDD (st1) & Cold HDD (sc1) 來比較起來貴了一些,但可以當作開機空間,另外低消只有 1GB (而 st1sc1 是 500GB),不過如果與 General Purpose SSD (gp2) 比的話,只要 I/O 有點量,整體的價錢就會超過 gp2,目前看起來還是用 gp2 會好一些...