Backblaze 對 2016 硬碟損耗狀況的分析

Backblaze 針對 2016 年的硬碟損耗狀況進行分析,這次因為剛好是跨年,所以包括了 2016Q4 與 2016 全年度的資料:「Backblaze Hard Drive Stats for 2016」。

大家比較有興趣的重點應該是全年度的廠牌比較:

HGST 的品質還是很厲害,另外也可以看到 Backblaze 愈來愈不使用 WD 的硬碟了...

ING Bank 在羅馬尼亞的機房出事...

ING Bank 在羅馬尼亞的機房發生資料損毀:「A Loud Sound Just Shut Down a Bank's Data Center for 10 Hours」。

不過原因是因為火災測試時噴發的音量太大,導致硬碟故障 XDDD

ING Bank’s main data center in Bucharest, Romania, was severely damaged over the weekend during a fire extinguishing test. In what is a very rare but known phenomenon, it was the loud sound of inert gas being released that destroyed dozens of hard drives. The site is currently offline and the bank relies solely on its backup data center, located within a couple of miles’ proximity.

報導給了個測試影片,示範超大的音量會對硬碟有什麼影響:

Backblaze 分析了 2015 年的硬碟故障比率

照慣例,Backblaze 每隔一段時間就會公佈最近的硬碟死亡率,在「Hard Drive Reliability Review for 2015」的這張圖好像說明了不少事情:

不過分開各個容量來看,又有一些不同的地方...

Backblaze 的 2015 Q3 硬碟故障率報告

依照往例,Backblaze 每一季結束後不久會放出上一季的硬碟故障率報告:「What Can 49,056 Hard Drives Tell Us? Hard Drive Reliability Stats for Q3 2015」。

如果仔細看的話,小硬碟中 Seagate 的死亡率還是很高 (尤其是 ST1500DL003 與 ST3000DM001 這兩顆驚人的數字),而大硬碟 (i.e. 4TB 以上) 都還算合理範圍。

值得提的是,其中 HGST 的硬碟死亡率比起其他家低了不少?

AWS 推出備份用的硬體設備:Snowball

AWS Import/Export 服務推出硬體:「AWS Import/Export Snowball – Transfer 1 Petabyte Per Week Using Amazon-Owned Storage Appliances」。

原先 AWS Import/Export 需要自己買硬碟寄到 AWS 指定的地方,對於一次性的用途來說不太合理 (硬碟只用一次?),而現在 AWS Import/Export 推出整台 appliance 來解決這個問題,長這樣:

本身就是可郵寄的包裝,包括了 E-ink 電子墨水面板 (很明顯是從 Kindle 那邊弄來的技術),所以連寄回的地址都不用填寫。而本身就有防水防震設計保護寄送過程稍微摔到或是淋到水也不會有事情,另外透過 AES 256bits 安全加密確保資料不會被盜用,NIST Special Publication 800-88 確認加密的資料沒有被盜取。

整個規劃看起來很有種軍規的設計感...

而每個 appliance 可以傳輸 50TB 的資料:

From there you simply copy up to 50 terabytes of data to the Snowball and disconnect it (a shipping label will automatically appear on the E Ink display), and ship it back to us for ingestion.

目前支援美東 (us-east-1) 與美西二區 (us-west-2):

You can import data to the US Standard and US West (Oregon) regions, with more on the way.

怎麼說呢,很科幻感的東西?

Backblaze 2015 年上半年的硬碟穩定度報告出爐了... (又黑了某公司一把)

Backblaze 這次丟出了 2015 年上半年的數據,標題雖然是寫 2015Q2,但文章裡有半年的統計資料:「Hard Drive Reliability Stats for Q2 2015」。

雖然都知道某公司的產品故障率偏高,但這樣是有仇嗎 XDDD

這是統計資料:

另外是 4TB 的歷史紀錄,右邊兩家的數字有點少啊,不過 45 顆硬碟壞一顆不就 2.x% 了嗎,這數字到底是怎麼出來的啊:

Etsy 用 SSD 的故事

EtsyLaurie Denness 對於 Etsy 使用各種品牌 SSD 的情況給出了他的經歷:「SSDs: A gift and a curse」。

重點在於開頭說的:

SSD firmware is buggy

可以看到當 SSD 配上 RAID controller 的時候,常常會需要找問題... (而且很難找)

Intel 的評價很不錯:

Okay, bad start, we’ve actually had no issues with Intel. This seems to be common across other companies we’ve spoken to.

OCZ 倒了,被 Toshiba 收購,而且 S.M.A.R.T. 資訊很差,很難預測什麼時候會掛掉 (有助於提前替換):

However, they had poor SMART info (none) so predicting failures was hard.

HP 是個大黑盒:

Unfortunately, HP have proprietary RAID controllers, and they don’t support SMART. Or rather, they refuse to talk to non-HP drives using off the shelf technology, they have their own methods.

Samsung 的評價不錯,C/P 值很高,而且有 S.M.A.R.T.:

Samsung saved the day and picked up from OCZ with a ludicrously cheap 960GB offering, the 840 EVO. A consumer drive, so very limited warranty, but for the price (~$400-500) you got great IOPS and they were reliable. They had better SMART info, and seemed to play nicely with our hardware.

不過 BB6Q 版的韌體搞爆了效能,雖然最後修好了:「Samsung Releases Firmware Update to Fix the SSD 840 EVO Read Performance Bug」。

LiteOn 則是掛在 GC 上 (RAID 裡同時掛掉兩顆以上):

The SSDs were having extended garbage collection periods, exacerbated by a smaller amount of SSDs with higher IO, in RAID6. This caused the controller to kick the drive out of the array… and unfortunately due to the write levelling across the drives, at least two of them were garbage collecting at the same time, destroying the array integrity.

不過後來 Dell 與 LiteOn 分別就 RAID controller 與 SSD 本身都跳下去修正,最後還是解決了:

Dell and LiteOn together identified and fixed weaknesses in their RAID controller, the backplane and the SSD firmware.

算是經驗分享,在 SSD 硬碟成熟的過程中間必經的道路 XD

Backblaze 的 3TB 硬碟故障分析

Backblaze 針對 3TB 硬碟繼續黑 Seagate... XDDD:「CSI: Backblaze – Dissecting 3TB Drive Failure」。

前面的數量比較不好懂,用百分比比較容易了解:

文章裡還討論了外接或內接的情況,最後猜測的結論是,Seagate 的這個 3TB 型號的品質特別差,因為 4TB 就正常許多:

While this particular 3TB model had a painfully high rate of failure, subsequent Seagate models such as their 4TB drive, model: ST4000DM000, are performing well with an annualized 2014 failure rate of just 2.6% as of December 31, 2014. These drives come with 3-year warranties and show no signs of hitting the wall.