Amazon S3 變成 Strong Consistency 背後的改善方式

看到 Hacker News 上的討論「Diving Deep on S3 Consistency (allthingsdistributed.com)」才想到該整理一下,原文的「Diving Deep on S3 Consistency」是 Amazon 的 CTO Werner Vogels 花了一些篇幅描述 Amazon S3 怎麼把 Eventually Consistent 變成 Strongly Consistent,當初 Amazon S3 公告時我也有寫一篇文章提到:「Amazon S3 現在變成 Strong Read-After-Write Consistency 啦...」。

Amazon S3 之所以會是 Eventually Consisient 是因為 Metadata Subsystem 的 cache 設計:

Per-object metadata is stored within a discrete S3 subsystem. This system is on the data path for GET, PUT, and DELETE requests, and is responsible for handling LIST and HEAD requests. At the core of this system is a persistence tier that stores metadata. Our persistence tier uses a caching technology that is designed to be highly resilient. S3 requests should still succeed even if infrastructure supporting the cache becomes impaired. This meant that, on rare occasions, writes might flow through one part of cache infrastructure while reads end up querying another. This was the primary source of S3’s eventual consistency.

如果要解決 Eventually Consistent,最直接的想法是拔掉 cache,但這樣對效能的影響太大,所以得在要保留 cache 的情況下設計,所以就想到用其他管道確保 cache 裡的資料狀態是正確的:

One early consideration for delivering strong consistency was to bypass our caching infrastructure and send requests directly to the persistence layer. But this wouldn’t meet our bar for no tradeoffs on performance. We needed to keep the cache. To keep values properly synchronized across cores, CPUs implement cache coherence protocols. And that’s what we needed here: a cache coherence protocol for our metadata caches that allowed strong consistency for all requests.

而接下來是設計一連串的邏輯確保每個 S3 object 的操作都有 serializability:

We had introduced new replication logic into our persistence tier that acts as a building block for our at-least-once event notification delivery system and our Replication Time Control feature. This new replication logic allows us to reason about the “order of operations” per-object in S3. This is the core piece of our cache coherency protocol.

後面又要確保這個 cache coherence 的 HA,最後要能夠驗證實做上的正確性,花的力氣比實做協定本身還多:

These verification techniques were a lot of work. They were more work, in fact, than the actual implementation itself. But we put this rigor into the design and implementation of S3’s strong consistency because that is what our customers need.

Amazon S3 算是 AWS 當初推出來的招牌,當時的 Amazon S3 底層的論文「Amazon's Dynamo」劇烈影響了後來整個產業 (雖然論文裡面是拿 Amazon 的購物車說明),這次的補充算是更新了原來論文的技術,告訴大家本來的 Eventually Consistent 是可以再拉到 Strongly Consistent。

Amazon 的 CTO 對 AWS 十週年的想法

雖然還沒到十週年,但 Amazon 的 CTO 還是寫下他對 AWS 十週年的想法了,也是這十年來學到的經驗:「10 Lessons from 10 Years of Amazon Web Services」。

The epoch of AWS is the launch of Amazon S3 on March 14, 2006, now almost 10 years ago.

他所列出的十個主題分別是:

  • Build evolvable systems
  • Expect the unexpected
  • Primitives not frameworks
  • Automation is key
  • APIs are forever
  • your resource usage
  • Build security in from the ground up
  • Encryption is a first-class citizen
  • The importance of the network
  • No gatekeepers

原來 AWS 也十年了...

AWS 老大宣佈 2016 年年底會開倫敦機房

才講完要成立南韓機房的計畫「AWS 將在 2016 年成立南韓機房」,直接由老大 Werner Vogels 加碼宣佈了倫敦機房 (英國機房) 的計畫:「London Calling! An AWS Region is coming to the UK!」。

預定在 2016 年年底或是 2017 年年初對外運作,這將會成為歐洲的第三個機房,而前兩個是冰島愛爾蘭與德國。可以看出來 2016 年大舉擴張的態勢...