AWS 同一區的 VPC Peering 流量不收費了

AWS 在同一個 AZ 裡面的流量是不收費的,但如果是跨帳號的話,還是要當作 inter-AZ 流量 (收 USD$0.01/GB 的費用),現在則是宣佈不用了:「Amazon VPC Announces Pricing Change for VPC Peering」。

要注意的是不同帳號的 a 不一定相同 (像是 us-east-1a 在不同帳號對應到的實際 AZ 不同),得透過 AWS 提供的資料確認底層實際的 AZ 是哪個。

回朔到這個月月初生效:

Starting May 1st 2021, all data transfer over a VPC Peering connection that stays within an Availability Zone (AZ) is now free. All data transfer over a VPC Peering connection that crosses Availability Zones will continue to be charged at the standard in-region data transfer rates. You can use the Availability Zone-ID to uniquely and consistently identify an Availability Zone across different AWS accounts.

CloudFront 把本來的 Lambda@Edge 產品線拆細,推出 CloudFront Functions

Amazon CloudFront 本來的 Lambda@Edge 產品線拆細,多出一個 CloudFront Functions:「Introducing CloudFront Functions – Run Your Code at the Edge with Low Latency at Any Scale」。

就產品面的角度就是限制比 Lambda@Edge 多,但價錢變便宜很多。

先看價錢的部份,CloudFront Functions 的價錢只有 request:

Invocation pricing is $0.10 per 1 million invocations ($0.0000001 per request).

而 Lambda@Edge 則是兩筆費用,光是 request 費用就是六倍:

Request pricing is $0.60 per 1 million requests ($0.0000006 per request).

Duration is calculated from the time your code begins executing until it returns or otherwise terminates. You are charged $0.00005001 for every GB-second used.

當然,CloudFront Functions 便宜帶來的限制也不少,最主要的限制可以從最大執行時間只有 1ms,以及記憶體只能用 2MB 就可以看出來:

但這對於輕量的操作來說已經夠用了,主要就是對 HTTP header 的操作...

另外比較表上看到個有趣的點「JavaScript (ECMAScript 5.1 compliant)」,這樣應該就不會是 Node.js (V8 engine),而是其他的 JS engine?

AWS 區域間的連線測試

Hacker News 首頁上看到「AWS Latency Monitoring」這個,看起來是常態性在所有的機房都開機器一直測試蒐集資料,就可以直接拉出來看...

有常見的 p50 與 p99 資訊,對於在規劃架構的時候還蠻有用,在「mda590/cloudping.co」這邊可以看到他是用 LambdaDynamoDB 的 endpoint 測試。

好像沒有 packet loss rate 的資訊,這個也蠻重要的...

Amazon S3 變成 Strong Consistency 背後的改善方式

看到 Hacker News 上的討論「Diving Deep on S3 Consistency (allthingsdistributed.com)」才想到該整理一下,原文的「Diving Deep on S3 Consistency」是 Amazon 的 CTO Werner Vogels 花了一些篇幅描述 Amazon S3 怎麼把 Eventually Consistent 變成 Strongly Consistent,當初 Amazon S3 公告時我也有寫一篇文章提到:「Amazon S3 現在變成 Strong Read-After-Write Consistency 啦...」。

Amazon S3 之所以會是 Eventually Consisient 是因為 Metadata Subsystem 的 cache 設計:

Per-object metadata is stored within a discrete S3 subsystem. This system is on the data path for GET, PUT, and DELETE requests, and is responsible for handling LIST and HEAD requests. At the core of this system is a persistence tier that stores metadata. Our persistence tier uses a caching technology that is designed to be highly resilient. S3 requests should still succeed even if infrastructure supporting the cache becomes impaired. This meant that, on rare occasions, writes might flow through one part of cache infrastructure while reads end up querying another. This was the primary source of S3’s eventual consistency.

如果要解決 Eventually Consistent,最直接的想法是拔掉 cache,但這樣對效能的影響太大,所以得在要保留 cache 的情況下設計,所以就想到用其他管道確保 cache 裡的資料狀態是正確的:

One early consideration for delivering strong consistency was to bypass our caching infrastructure and send requests directly to the persistence layer. But this wouldn’t meet our bar for no tradeoffs on performance. We needed to keep the cache. To keep values properly synchronized across cores, CPUs implement cache coherence protocols. And that’s what we needed here: a cache coherence protocol for our metadata caches that allowed strong consistency for all requests.

而接下來是設計一連串的邏輯確保每個 S3 object 的操作都有 serializability:

We had introduced new replication logic into our persistence tier that acts as a building block for our at-least-once event notification delivery system and our Replication Time Control feature. This new replication logic allows us to reason about the “order of operations” per-object in S3. This is the core piece of our cache coherency protocol.

後面又要確保這個 cache coherence 的 HA,最後要能夠驗證實做上的正確性,花的力氣比實做協定本身還多:

These verification techniques were a lot of work. They were more work, in fact, than the actual implementation itself. But we put this rigor into the design and implementation of S3’s strong consistency because that is what our customers need.

Amazon S3 算是 AWS 當初推出來的招牌,當時的 Amazon S3 底層的論文「Amazon's Dynamo」劇烈影響了後來整個產業 (雖然論文裡面是拿 Amazon 的購物車說明),這次的補充算是更新了原來論文的技術,告訴大家本來的 Eventually Consistent 是可以再拉到 Strongly Consistent。

Amazon EC2 的 t3/t3a/t4g 的 CPU credit 保留七天的限制

Twitter 上看到朋友提到 t3 系列的機器有保留七天的 CPU credit:

在「CPU credits and baseline utilization for burstable performance instances」這邊有提到,t3/t3a/t4g 的設計都是讓你可以塞 24h 小時的量:

這邊講的七天是這段:

CPU credits on a running instance do not expire.

For T2, the CPU credit balance does not persist between instance stops and starts. If you stop a T2 instance, the instance loses all its accrued credits.

For T3 and T4g, the CPU credit balance persists for seven days after an instance stops and the credits are lost thereafter. If you start the instance within seven days, no credits are lost.

開著的機器的 CPU credit 不會過期,只會到最大上限 (在同一篇文件裡面的表格有提到),t2 的機器關掉後 (stop) CPU credit 就會直接消失,而 t3/t3a/t4g 則在關掉後會保留七天。

之前沒注意到文件上的這點。

另外之前在測試自己架設 Sentry 時還測過 t3a.medium -> r5a.large -> t3a.medium 這樣換過去又換回來的情況,本來的 CPU credit 是可以繼續用的,看起來 CPU credit 不會因為 family type 改變就不見 (不過不確定這個是不是 undefined behavior...)。

Amazon EC2 Auto Scaling 支援 Warm Pools

EC2 推出的新功能:「Amazon EC2 Auto Scaling introduces Warm Pools to accelerate scale out while saving money」。

重點只有這個,這個作法是先把機器準備好,然後關掉放在 stopped 狀態:

Additionally, Warm Pools offer a way to save compute costs by placing pre-initialized instances in a stopped state.

理論上可以快到 30 秒:

Now, these applications can start pre-initialized, stopped instances to serve traffic in as low as 30 seconds.

不過考慮到就算是 stopped 的機器,啟動時還是得去確認有沒有新版程式... 目前可以理解的部份,應該是加快 EBS 的準備時間吧?

Amazon EC2 提供跨區直接複製 AMI (Image) 的功能

Amazon EC2AMI 可以跨區複製了:「Amazon EC2 now allows you to copy Amazon Machine Images across AWS GovCloud, AWS China and other AWS Regions」。

如同公告提到的,在這個功能出來以前,想要產生一樣的 image 得重新在 build 一份:

Previously, to copy AMIs across these AWS regions, you had to rebuild the AMI in each of them. These partitions enabled data isolation but often made this copy process complex, time-consuming and expensive.

有一些限制,image 大小必須在 1TB 以下,另外需要存到 S3 上,不過這些限制應該是還好:

This feature provides a packaged format that allows AMIs of size 1TB or less to be stored in AWS Simple Storage Service (S3) and later moved to any other region.

然後目前只有透過 cli 操作的方式,或是直接用 SDK 呼叫 API,看起來 web console 還沒提供:

This functionality is available through the AWS Command Line Interface (AWS CLI) and the AWS Software Development Kit (AWS SDK). To learn more about copying AMIs across these partitions, please refer to the documentation.

AWS 推出 Amazon Route 53 Resolver DNS Firewall

長久以來的洞總算有比較好的方法補上了,AWS 推出了 Amazon Route 53 Resolver DNS Firewall:「Introducing Amazon Route 53 Resolver DNS Firewall」。

Route 53 Resolver 是 AWS 官方提供的 DNS Resolver,沒有特殊的設定的話通常會在 x.x.x.2 (/24 或是更大的網段),先前一直沒有辦法解決 data leak 的問題,也就是透過 DNS 把敏感資料從 private network 裡丟出去。

以前的作法是透過 security group 擋掉對 Route 53 Resolver 的流量 (或是透過 VPC 的 Firewall 擋),然後自己架設兩台 DNS resolver 過濾,現在 Route 53 Resolver 支援 DNS Firewall,提供 allowlist 與 blocklist 這兩個功能使用,總算是把這件事情解的比較乾淨了:

Route 53 Resolver DNS Firewall lets you create “blocklists” for domains you don’t want your VPC resources to communicate with via DNS. You can also take a stricter, “walled-garden” approach by creating “allowlists” that permit outbound DNS queries only to domains you specify. You can also create alerts for when outbound DNS queries match certain firewall rules, allowing you to test your rules before deploying for production traffic.

另外這次的 DNS Firwall 提供了兩組由 AWS 維護的清單讓人使用,包括了 malware 與 botnet:

Route 53 Resolver DNS Firewall offers two managed domain lists—malware domains and botnet command and control domains—enabling you to get started quickly with managed protections against common threats.

這樣省事多了...

AWS 推出 CloudWatch Metric Streams

AWS 推出了 CloudWatch Metric Streams,把 CloudWatch Metric 的資料往 Kinesis Data Firehose 裡面丟:「CloudWatch Metric Streams – Send AWS Metrics to Partners and to Your Apps in Real Time」。

其中一個賣點是即時性比用 API 去拉好很多:

In order to make it easier for AWS Partners and others to gain access to CloudWatch metrics faster and at scale, we are launching CloudWatch Metric Streams. Instead of polling (which can result in 5 to 10 minutes of latency), metrics are delivered to a Kinesis Data Firehose stream.

格式上可以是 JSON 或是 Open Telemetry

When you set up a stream you choose between the binary Open Telemetry 0.7 format, and the human-readable JSON format.

另外一個賣點是價位,每千次 $0.003:

Pricing – You pay $0.003 for every 1000 metric updates, and for any charges associated with the Kinesis Data Firehose. To learn more, check out the pricing page.

另外算一下 Kinesis Data Firehose 的價錢,是以資料量的大小計費,不過最小計價單位是 5KB (一筆應該是不會到),單價是 $0.029/GB (us-east-1) 或是 $0.037/GB (ap-southeast-1),算了一下跟 CloudWatch Metrics Streams 比起來只是零頭...

之前如果要自己拉出來的話是透過 API call 抓,每 1000 次是 USD$0.01,這個方法相較起來便宜不少,不過數量多的時候還是一筆費用 (而且有不少 metrics 是一分鐘更新一次)。

如果只是要備份起來或是跑分析的話,也許先前用 API 拉的作法可能還是比較好?一個小時拉一次對於備份與分析應該都很夠了,而 alarm 的機制還是掛在 CloudWatch 上。

這次產品的定位看起來是要把 ecosystem 做起來:

We designed this feature with the goal of making it easier & more efficient for AWS Partners including Datadog, Dynatrace, New Relic, Splunk, and Sumo Logic to get access to metrics so that the partners can build even better tools.

EC2 總算支援透過 Serial Console 操作了...

以往 Amazon EC2 的機器爛到開不起來時只能「看」到 Console 的輸出,然後要把 root volume 掛到其他機器上修正,接著再掛回來 (然後沒修好就要再重複...),現在總算可以透過 EC2 Serial Console 來操作了:「Troubleshoot Boot and Networking Issues with New EC2 Serial Console」。

不過裡面有一些限制,首先機器必須是基於 AWS Nitro System,這個部份在「Amazon EC2 Instance Types」這邊可以翻到是不是 Nitro,比較新的 family type 應該都是 (像是 t3/t3a/t4g 都是 Nitro,但 t2 不是):

EC2 Serial Console access is available for EC2 instances based on the AWS Nitro System. It supports all major Linux distributions, FreeBSD, NetBSD, Microsoft Windows, and VMWare.

另外是支援的區域目前只有幾個主力區域,不過公司在用的主力區 (新加坡) 也進去了,之後遇到問題的時候可以測試看看:

  • US East (N. Virginia), US West (Oregon), US East (Ohio)
  • Europe (Ireland), Europe (Frankfurt)
  • Asia Pacific (Tokyo), Asia Pacific (Sydney), Asia Pacific (Singapore)

Twitter 上看到有些人有提到「總算啊...」之類的感想...