EC2 Spot Instance 價錢的上漲趨勢

在「Farewell to the Era of Cheap EC2 Spot Instances」這邊討論了 Amazon EC2spot instance 最近有上漲的趨勢,像是這張應該是從 web console 拉出來 us-east-1t4g.nano 趨勢:

有不少 region 都有類似的情況,尤其是最常用的 us-east-1us-west-2

上個月 Plurk 的朋友也有聊到類似的情況,在 us-east-1 上愈來愈難找到便宜的 spot instance 機器了,當時還在想是不是有什麼大型活動,但文章出來後才發現大家都有遇到類似的情況。

另外在 Hacker News 上面也有討論:「Farewell to the Era of Cheap EC2 Spot Instances (pauley.me)」,裡面是有提到了一些工具可以再更彈性的調整,用更多邏輯改善成本,像是 AutoSpotting - Community Edition 這個專案用 lambda 幫你調整:

The entire logic described above is implemented in a set of Lambda functions deployed using CloudFormation or Terraform stacks that can be installed and configured in just a few minutes.

回頭來看一下目前的情況 (以及猜測 AWS 的策略),如果 spot instance 的常態價錢維持在牌價的六七成,等於是逼你規劃用 Savings Plans 之類的方案,然後讓 spot instance 慢慢退場。

話說回來,接下來不知道會不會有人去告 90% saving 的廣告宣傳...

AWS Fargate 推出 Spot

相較於 Amazon EC2 有 Spot Instance (可以利用 Spot Instance 的競價機制省下很多費用),這次 AWS re:InventFargate 也推出了對應的產品線:「AWS Fargate Spot Now Generally Available」。

跟 EC2 的相同,你在上面跑的應用程式必須可以接受隨時中斷服務 (i.e. 必須是 crash-safe),常見的情境是 worker 類的程式。

價錢上大約在三折 (寫這篇時 us-east-1 目前的價錢),考慮到啟動的速度比 EC2 快很多,這樣好像是個可以考慮的方案...

EC2 提供短期 RI 選項...

Amazon EC2 推出了短期 RI 選項:「Amazon EC2 now offers On-Demand Capacity Reservations」。

最早的 RI 設計包括了「保證一定開的起來」與「折扣」,但後來 AWS 大概是發現大家都是要「折扣」而不是保證開的起來,所以就開放了只有 discount 的部份。把原來有保證一定開的起來的部份變成 Zonal Reserved Instances。

不過 Zonal Reserved Instances 最少要一年 (有對應的折扣),這次規劃的 On-Demand Capacity Reservations 可以隨時取消,但是就沒折扣了:

Previously in order to achieve a capacity reservation, customers had to leverage Zonal Reserved Instances and commit to that capacity for a minimum of one year. Now with On-Demand Capacity Reservations, customers can reserve the exact capacity they need, in the location they need, and can keep it only for as long as they need it.

不過費用計算上會組合已經買的 Regional RI,如果沒被用掉的話還是可以將對應的折扣拿進來用:

If you have Regional RI discounts, they will automatically apply to any matching Capacity Reservation. This gives you the flexibility to selectively add capacity reservations and still get the Regional RI discounts for that usage.

算是對需要「保證開的起來」的單位提供短期的彈性,實際上後面應該是去 spot instance 那邊踢一些機器出來?XD

AWS 推出 EC2 Fleet:直接混搭標準 EC2、Spot、RI 的計算

AWS 將本來 EC2Spot Fleet 加上了 EC2 Fleet,計算的公式從本來只有 Spot Instace,變成把標準 EC2 Instance 與 RI 的計算全部都納進來:「EC2 Fleet – Manage Thousands of On-Demand and Spot Instances with One Request」。

Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call.

不過目前有些服務還沒整,主要是跟 auto scaling 有關的部份,這部份應該是一次上一大包:

We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.

整完以後對於要省成本就更簡單了...

EC2 的 Spot Instance 可以「接關」

Amazon EC2Spot Instance 可以「接關」了:「New – Stop & Resume Workloads on EC2 Spot Instances」。

當 Spot Instance 的競價不足以標到機器時,他會先關起來 (Stop),等到價錢低於競價後就會再打開機器,這時候的狀態就會恢復。另外也提到了必須是使用 EBS 的機器才支援:

Amazon EC2 Spot now allows Amazon EBS-backed instances to be stopped in the event of interruption, instead of being terminated when capacity is no longer available at your preferred price. Spot can then fulfill your request by restarting instances from a stopped state when capacity is available within your price and time requirements.

用法是把 Spot Instance 的關機設定設為 Stop:

To use this new feature, choose “stop” instead of “terminate” as the interruption behavior when submitting a persistent Spot request. When you choose “stop”, Spot will shut down your instance upon interruption.

恢復的時候就會儘量保持一樣地開回來 (連 instance id 都相同):

When capacity is available again within your price and time requirements, Spot will restart your instance. Upon restart, the EBS root device is restored from its prior state, previously attached data volumes are reattached, and the instance retains its instance ID.

是個接關的感覺 XD

EC2 與 EBS 十月開始以秒計費

雖然只是 Amazon EC2Amazon EBS 計價模式的改變,但這次 AWS 的改變對於許多開發流程有很大的影響 (重點在 EC2 的部份):「New – Per-Second Billing for EC2 Instances and EBS Volumes」。

10/2 開始改變 (而不是 10/1),低消一分鐘,Windows 機種以及需要額外收費的 Linux 機種不在範圍內:

This change is effective in all AWS Regions and will be effective October 2, for all Linux instances that are newly launched or already running. Per-second billing is not currently applicable to instances running Microsoft Windows or Linux distributions that have a separate hourly charge. There is a 1 minute minimum charge per-instance.

然後 Spot 與買 RI 後也是一樣以秒計價:

List prices and Spot Market prices are still listed on a per-hour basis, but bills are calculated down to the second, as is Reserved Instance usage (you can launch, use, and terminate multiple instances within an hour and get the Reserved Instance Benefit for all of the instances).

這次改變的影響很巨大。馬上可以想到幾個情境...

第一個是對於實踐 Release early, release often 的團隊來說,如果設計成每 deploy 一次就建一個新的 AMI (最乾淨的作法),再開新機器換掉的話,成本就會增加不少。所以對於這樣的團隊,就會偏好朝著替換現有目錄內的東西後重啟...

現在改成以秒計費後,直接透過 Blue-Green Deployment 就可以了 (AWS CodeDeploy 年初也支援了:「AWS CodeDeploy 支援 BlueGreenDeployment」):(如果不熟悉 Blue-Green Deployment 的話,更白話的說法就是「先建後拆」...)

同樣的理由,對於 Auto Scaling 的 policy 也有些改變。之前機器開起來都會想讓他跑一個小時,所以 scale down 的部份都會寫的比較鬆一點。現在就可以重新規劃了...

另外一個影響是對使用 container 的誘因少了不少。很多人用 container 的用法是開大台機器再裡面拆給不同服務用,讓資源利用率變高,現在變成用多少算多少後就不太需要這樣了...

當然也還是有缺點。以前 Spot Instance 如果被 AWS 收回時,最後的那個小時是不計費的。現在因為以秒計費,變成要收費了...

最後是 10/2 生效這件事情頗怪,該不會是財務部門不願意配合 10/1 星期天加班生效,所以只好變成 10/2 生效這種理由吧... XDDD

Amazon EMR 支援 Instance Fleet 了

所以 Amazon EMRSpot Instance 競價的時候可以使用 Instance Fleet 了:「New – Amazon EMR Instance Fleets」。

雖然機器比較小台而導致數量多一些的時候會有 overhead (通常),但更容易保持穩定的價錢...

Auto Scaling 可以拉 EC2 Spot Instance 進來用了

Update:auto scaling 在 2012 就支援 spot instance 了:「EC2 Spot Instance Updates – Auto Scaling and CloudFormation Integration, New Sample App」,我一直有 auto scaling 不支援的印象... 這次是支援 EC2 Spot Fleets (i.e. 以 capacity 為主的架構,將兩台 c4.4xlarge 與一台 c4.8xlarge 當作是有相同 capacity 來喊價)。

AWSAuto Scaling 宣佈支援 Spot Instance 了:「New – Auto Scaling for EC2 Spot Fleets」。

雖然文章主要都是以 worker 之類的應用來做,但可以看到還是有說 web service:

Web Service – Scale web services based on measured response time and average requests per second.

我猜官方還是不建議這樣用,所以整篇文章都還是以 worker 類為主。應該是因為 web service 直接對使用者,用 Auto Scaling 不一定開的起來,反而有可能會爆炸 XD

EC2 Spot Blocks:固定時間的 Spot Instances

EC2 設計 Spot Instances 提供某些應用程式可以用比較低廉的價錢租用機器,但缺點是隨時可能會中斷,所以程式必須定時記錄進度。

但不是每種應用程式都有辦法這樣做,所以一般遇到不能中斷的還是會用 On-Demand Instances 來處理。而現在 EC2 則提供了 Spot Blocks 來解決這個問題:「New – EC2 Spot Blocks for Defined-Duration Workloads」。

你可以設定 1 到 6 個小時的執行時間,時間到就會自動結束:

In order to make EC2 an even better fit for this type of defined-duration workload, you can now launch Spot instances that will run continuously for a finite duration (1 to 6 hours).

Spot Instances 與 Spot Blocks 是分開競價,大約比 On-Demand 便宜 30%~45%,同時在非尖峰時間另外會有 5% 的優惠:

Pricing is based on the requested duration and the available capacity, and is typically 30% to 45% less than On-Demand, with an additional 5% off during non-peak hours for the region. Spot blocks and Spot instances are priced separately; you can view the current Spot pricing to learn more.

EC2 Spot Instance 可以用 Capacity 競標了

前幾天 AWS 的「New – Resource-Oriented Bidding for EC2 Spot Instances」這篇文章提到 EC2 Spot Instance 可以用 capaciy 競標了,也就是以「資源的總量」來飆,而非指定某種型態的 instance。

以文章裡的例子來說,假設要標 488 個單位的 capacity,那麼有可能出現:

  • 2 x r3.8xlarge
  • 4 x r3.4xlarge
  • 8 x r3.2xlarge
  • 16 x r3.xlarge
  • 32 x r3.large

也有可能出現混搭的版本:

  • 1 x r3.8xlarge and 2 x r3.4xlarge
  • 2 x r3.4xlarge and 8 x r3.xlarge
  • 8 x r3.xlarge and 16 x r3.large

對於某種 spot instance 價錢突然提高時,可以改用其他 instance 繼續執行,變得更有彈性...