以 API 提供的功能看起來還算 okay，這樣就不需要自己用 Linode 提供的 IP Failover 並且在上面架 HAProxy server...
Amazon 在網頁上說明了在 4/21 美東地區 EBS/RDS 故障的原因：「Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region」。
整篇文章相當長，但整個連鎖反應是起於操作時的人為疏失。而 RDS 因為是基於 EBS 上，所以也跟著大爆炸...
For customers with an attached EBS volume or a running RDS database instance in the affected Availability Zone in the US East Region at the time of the disruption, regardless of whether their resources and application were impacted or not, we are going to provide a 10 day credit equal to 100% of their usage of EBS Volumes, EC2 Instances and RDS database instances that were running in the affected Availability Zone.
另外，AWS 的 SLA 中有要求當未滿 SLA 條件時，必須由使用者提出退款要求，但這次則是例外，符合條件的會主動在下次的帳單上，或是系統的頁面上看到：
These customers will not have to do anything in order to receive this credit, as it will be automatically applied to their next AWS bill. Customers can see whether they qualify for the service credit by logging into their AWS Account Activity page.
討論裡面也有投影片說明 Netflix 選 AWS 的原因是因為「來不及建立 Data Center」，但在建立時也同時注意到「Cloud 有可能會大規模爛掉」的情況而設計了很多機制防範：「Netflix in the cloud 2011」。
利用 AWS 在多個不同地點都有機房把架構在 AWS 上所能提供的 HA 機制發揮到極致，不過這是建立在「Cheaper than cost of being down」的想法上 (因為 Netflix 已經算是某類型的電子商務公司了)。
Update：在 Service Health Dashboard 上面總算有初步的原因了：
8:54 AM PDT We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.