Backblaze 採購硬碟的策略

在「How Backblaze Buys Hard Drives」這篇裡面提到了 Backblaze 採購硬碟的策略,可以看到完全都是偏成本走向,所以裡面的策略一般個人用不太到,一般企業也不應該照抄,但拿來看看還蠻有趣的...

像是因為硬碟太多,所以硬碟的使用電量是他們在評估成本時蠻重要的一環,這點在一般的情境下不太會考慮到:

Power draw is a very important metric for us and the high speed enterprise drives are expensive in terms of power cost. We now total around 1.5 megawatts in power consumption in our centers, and I can tell you that every watt matters for reducing costs.

另外也提到了 SMR 硬碟的特性,在單位成本雖然有比較高的容量,但導致架構面需要配合 (cache),而也會有工程端的成本提昇,所以不是很愛:

SMR would give us a 10-15% capacity-to-dollar boost, but it also requires host-level management of sequential data writing. Additionally, the new archive type of drives require a flash-based caching layer. Both of these requirements would mean significant increases in engineering resources to support and thereby even more investment. So all-in-all, SMR isn’t cost-effective in our system.

成本面上,他們觀察到的現象是每季會降 5%~10%:

Ideally, I can achieve a 5-10% cost reduction per terabyte per quarter, which is a number based on historical price trends and our performance for the past 10 years.

另外提到了用 SAS controller 可以接多個 SATA 硬碟的事情 (雖然還是成本考量),但這塊也蠻有趣的:

Longer term, one thing we’re looking toward is phasing out SATA controller/port multiplier combo. This might be more technical than some of our readers want to go, but: SAS controllers are a more commonly used method in dense storage servers. Using SATA drives with SAS controllers can provide as much as a 2x improvement in system throughput vs SATA, which is important to me, even though serial ATA (SATA) port multipliers are slightly less expensive. When we started our Storage Pod construction, using SATA controller/port multiplier combo was a great way to keep costs down. But since then, the cost for using SAS controllers and backplanes has come down significantly.

用 Google Docs 惡搞的方式...

看到「UDS : Unlimited Drive Storage」這個專案,利用 Google Docs 存放資料。主要的原因是因為 Google Docs 不計入 Google Drive 所使用的空間:

Google Docs take up 0 bytes of quota in your Google Drive

用這個方法可以存放不少大檔案 (像是各種 ISO image),讓人想起當年 Love Machine 的玩法 (不知道的人可以參考「愛的機器 Love machine」這篇),切割檔案後傳到某些空間以提供下載?只是這邊是用 base64 放到 Google Docs 上...

base64 的資料會比原始資料大 33%,而 Google Docs 單篇的上限大約是 710KB:

Size of the encoded file is always larger than the original. Base64 encodes binary data to a ratio of about 4:3.

A single google doc can store about a million characters. This is around 710KB of base64 encoded data.

方法不是太新鮮,但是讓人頗懷念的... XD

Backblaze 的 2017 年硬碟年度報告

Backblaze 照慣例發表了 2017Q4 與 2017 全年的硬碟報告出來了:「Backblaze Hard Drive Stats for 2017」。

最重要就這三張圖表,第一張是 2017Q4 資料,第二張是從 2013/04 到 2017/12 的資料,第三張是這三年的資料 (2015/2016/2017):

我先說一下結論,因為這幾年幾乎都只採購 SeagateHGST 的硬碟,所以要用他們的資料判斷 WDToshiba 的硬碟已經沒有價值了。

唯一有價值的資料是 HGST 的硬碟比 Seagate 好不少,要做出其他結論的樣本數都不夠。

Backblaze 的資料量

Backblaze 在「Yes, Backblaze Just Ordered 100 Petabytes of Hard Drives」這篇提到他們這一次買了 100PB 的硬碟,不過這應該還好 XD

比較有趣的數字是他們目前存了多少資料:

破 400PB 了... 而且看起來成長速度頗穩定的。

我也用 rclone 把自己的資料往上面備份 (先加密再傳上去),目前連 free tier 都還沒用完 (雖然信用卡卡號掛著)。

Backblaze 2017Q1 對硬碟的分析

Backblaze 放出 2017Q1 對硬碟的分析資料:「Hard Drive Stats for Q1 2017」。

相較於之前的報告 (Backblaze Hard Drive Stats for 2016),這次則是把硬碟數量考慮進去,做了一份有正負誤差的:

最近的趨勢沒什麼變,整體上來看 HGST 的品質還是最好的。

Backblaze 對 2016 硬碟損耗狀況的分析

Backblaze 針對 2016 年的硬碟損耗狀況進行分析,這次因為剛好是跨年,所以包括了 2016Q4 與 2016 全年度的資料:「Backblaze Hard Drive Stats for 2016」。

大家比較有興趣的重點應該是全年度的廠牌比較:

HGST 的品質還是很厲害,另外也可以看到 Backblaze 愈來愈不使用 WD 的硬碟了...

Backblaze 分析了 2015 年的硬碟故障比率

照慣例,Backblaze 每隔一段時間就會公佈最近的硬碟死亡率,在「Hard Drive Reliability Review for 2015」的這張圖好像說明了不少事情:

不過分開各個容量來看,又有一些不同的地方...

Backblaze 2015 年上半年的硬碟穩定度報告出爐了... (又黑了某公司一把)

Backblaze 這次丟出了 2015 年上半年的數據,標題雖然是寫 2015Q2,但文章裡有半年的統計資料:「Hard Drive Reliability Stats for Q2 2015」。

雖然都知道某公司的產品故障率偏高,但這樣是有仇嗎 XDDD

這是統計資料:

另外是 4TB 的歷史紀錄,右邊兩家的數字有點少啊,不過 45 顆硬碟壞一顆不就 2.x% 了嗎,這數字到底是怎麼出來的啊:

Backblaze 再次發表各家硬碟耐用程度...

今年年初 (一月) 的時候發表過一次「各家硬碟的耐用程度...」引起爭議厚的最新力做,九月再發表一次:「Hard Drive Reliability Update – Sep 2014」。

灰色部份是一月的數據,其他顏色是九月的數據。文中有考慮是否要換成企業級的硬碟 (enterprise drives),但兩個評估的答案是否定的。

第一個評估是成本考量,就算一般硬碟以三年保固期有 15% 的 failure rate,相較於企業級 0% failure rate 計算 (於是直接算成 10 年),成本是不划算的:

Today on Amazon, a Seagate 3 TB “enterprise” drive costs $235 versus a Seagate 3 TB “desktop” drive costs $102. Most of the drives we get have a 3-year warranty, making failures a non-issue from a cost perspective for that period. However, even if there were no warranty, a 15% annual failure rate on the consumer “desktop” drive and a 0% failure rate on the “enterprise” drive, the breakeven would be 10 years, which is longer than we expect to even run the drives for.

更何況企業級硬碟的情況根本沒什麼差:

The assumption that “enterprise” drives would work better than “consumer” drives has not been true in our tests. I analyzed both of these types of drives in our system and found that their failure rates in our environment were very similar — with the “consumer” drives actually being slightly more reliable.