在「A web performance issue」這邊講到 Mozilla 的系統產生效能問題,後續的 trouble shooting 以及解決問題的方案。
這個系統跑在 AWS 上,在一連串確認後發現是 RDS 所使用的 EBS 的 IOPS 滿了:
After reading a lot of documentation about Amazon’s RDS set-up I determined that slow downs in the database were related to IOPS spikes. Amazon gives you 3 IOPS per Gb and with a storage of 1 Terabyte we had 3,000 IOPS as our baseline. The graph below shows that at times we would get above that max baseline.
然後大家對於解法都差不多,因為 Provisioned IOPS 太貴,所以直接加大空間換 IOPS 出來 (因為 General SSD 裡 1 GB 給 3 IOPS):
To increase the IOPS baseline we could either increase the storage size or switch from General SSD to Provisioned IOPS storage. The cost of the different storage type was much higher so we decided to double our storage, thus, doubling our IOPS baseline. You can see in the graph below that we’re constantly above our previous baseline. This change helped Treeherder’s performance a lot.
然後再設警告機制,下次就可以提前再拉昇:
In order to prevent getting into such a state in the future, I also created a CloudWatch alert. We would get alerted if the combined IOPS is greater than 5,700 IOPS for 6 datapoints within 10 minutes.
不過 General SSD 的 IOPS 是沒有 100% 保證的,只有這樣寫:
AWS designs gp2 volumes to deliver 90% of the provisioned performance 99% of the time.
大多數的情況應該是夠用啦...