Amazon S3 改變 403 的收費方式

這算是一連串的故事,首先是四月底的時候「How an empty S3 bucket can make your AWS bill explode」這篇,提到了他一個晚上收到了 US$1,300 的帳單,因為有人 (沒有權限的人) 對他的 S3 bucket 狂打了 100M requests (一億筆),雖然都是 403 的 access denied,但還是得付 request 與頻寬的費用。

對於想要搞的人來說,us-east-1 的 Amazon S3 費用是 $0.005/1K requests (PUT, COPY, POST, LIST requests),換算大一點的單位是 $5/1M requests,拿個 ab 之類的工具超級簡單就可以打出破千 reqs/sec,如果是 k6 之類的工具,其實一台電腦就蠻容易打爆?

作者聯絡 AWS 客服後,客服回答你需要付這筆費用 (「這不是 bug,是 feature」):

Yes, S3 charges for unauthorized requests (4xx) as well[1]. That’s expected behavior.

然後這件事情就在社群傳開了,傳到 Jeff Barr 後直接公開提到他認為客戶不應該付 unauthorized request 的 cost (應該是先跟內部其他高層討論過了),等於是宣佈了會改掉:

不過這件事情之前應該就有人提過了,結果 Colin Percival 直接戳,他在 2006 年 Amazon S3 剛出來的時候就提過了:

Anyway,兩個禮拜過去後,剛剛看到宣佈收費方式修改:「Amazon S3 will no longer charge for several HTTP error codes」。

針對從不屬於自己帳號所產生的 403 不收費 (包括 request 與頻寬費用):

With this change, bucket owners will never incur request or bandwidth charges for requests that return an HTTP 403 (Access Denied) error response if initiated from outside their individual AWS account or AWS Organization.

然後多了一頁「Billing for Amazon S3 error responses」專門說明這件事情,這邊列的比較完整,除了 403 以外也包含了其他的 HTTP response code 是不收費的:

The current page shows a full list of HTTP 3XX and 4XX status codes that won't be billed.

補了一個 18 年的洞...

Route 53 Resolver DNS Firewall 的 chain 處理

在「Stop the CNAME chain struggle: Simplified management with Route 53 Resolver DNS Firewall」這邊看到的新功能。

說實話... 我早就忘記 Route 53 Resolver DNS Firewall 這個產品了,我查資料才發現我在 2021 年的時候寫過:「AWS 推出 Amazon Route 53 Resolver DNS Firewall」。

這個產品的用途是避免透過 DNS 將敏感資訊打出去,不過先前的產品的條件很死,遇到 CNAME 或是 DNAME 的情況,你必須事先把可能後續的 record 也放進白名單才行,所以如果遇到類似於 X (Twitter) 用的 pbs.twimg.com 的情況就很麻煩了:

;; ANSWER SECTION:
pbs.twimg.com.          300     IN      CNAME   cs196.wac.edgecastcdn.net.
cs196.wac.edgecastcdn.net. 3409 IN      CNAME   cs2-wac.apr-8315.edgecastdns.net.
cs2-wac.apr-8315.edgecastdns.net. 110 IN CNAME  cs2-wac-us.8315.ecdns.net.
cs2-wac-us.8315.ecdns.net. 110  IN      CNAME   cs45.wac.edgecastcdn.net.
cs45.wac.edgecastcdn.net. 725   IN      A       117.18.237.70

理想上是你放行 pbs.twimg.com 就好,但因為 CNAME 的關係,你可能會需要多放行 *.edgecastcdn.net 以及 *.ecdns.net

可是這是第三方的服務,你無法控制對方怎麼切換 (沒有 API contract 的概念),像是有時候他會跳到 Fastly

;; ANSWER SECTION:
pbs.twimg.com.          290     IN      CNAME   dualstack.twimg.twitter.map.fastly.net.
dualstack.twimg.twitter.map.fastly.net. 290 IN A 151.101.40.159

如果之後又跑出 Akamai 或是 CloudFront 的話就沒完沒了。

另外一種常見的情況是第三方的 API endpoint,對方有可能有多個不同的點做 DR 切換,有可能 CNAME 到 AWSELB 或是 GCPCloud Load balancing 上。

所以為了「保險」,這個方式通常都是開整個 CDN 的服務,但這麼一來攻擊者可以透過租用這些服務 (像是 *.cloudfront.net),搭配一些其他比較鬆的 rule 鑽出來。

這次的這個功能有點 stateful firewall 的概念,第一個啟動的 record 是被放行的,那 CNAME 或是 DNAME 延伸出來的 record 也跟著放行,這樣算是補強了這個問題...

OpenTofu 釋出第一個 GA 版本 1.7.0

從去年八九月從 Terraform fork 出來並且改名後 (「OpenTF 宣佈從 Terraform 最後一個 Open Source 版本 fork 出來」、「OpenTF (Terraform 的 fork) 改名為 OpenTofu」),總算是推出第一個 GA 版本了:「OpenTofu 1.7.0 is out with State Encryption, Dynamic Provider-Defined Functions, and more」。

除了把 trademark 相關的東西換掉以外,還實作了一些功能,這樣化半年弄出個起頭,接下來看後續的能量?好像還行...

AWS KMS 降價

看到 AWS KMS 的公告:「AWS KMS announces more flexible automatic key rotation」。

第一個是增加了 rotation 的彈性,可以設定日期,不過 7 years (2560 days) 是怎麼算出來的?(某種風險公式?):

You can now customize the frequency of rotation period between 90 days to 7 years (2560 days) as well as invoke key rotation on demand for customer managed KMS keys.

另外一個是降價,看起來是 rotate 第三次以後就不收費?對於有在 rotation 的單位來說是個降價... (如果沒在 rotate 的就 XDDD)

We’re also introducing new pricing for KMS automatic key rotation. Previously, each rotation of a KMS key added $1/month per rotation to a KMS customer managed key. Now, for KMS keys that you rotate automatically or on demand, the first and second rotation of the key adds $1/month (prorated hourly) in cost, but this price increase is capped at the second rotation, and all rotations after your second rotation are not billed. For customers that have keys with 3 or more rotations, all of these keys will see a price reduction to $3/month (prorated) starting the first week of May 2024.

這也算是鼓勵拉高 rotation 的次數吧?

IBM 買 HashiCorp

居然是 IBMHashiCorp 啊:「IBM to buy HashiCorp in $6.4 billion deal to expand cloud software」。

Red Hat 先前在 RHEL 上面的事情讓 open source community 頗感冒,加上 HashiCorp 先前切到 BSL (先前有提到的:「HashiCorp 將放棄 Open Source License,改採用 BSL 1.1」),看起來差不多有個概念了?

另外 IBM 這邊官方新聞稿在這:「IBM to Acquire HashiCorp, Inc. Creating a Comprehensive End-to-End Hybrid Cloud Platform」。

Cloudflare Workers 支援 Python (是 open beta)

Cloudflare 宣佈 Cloudflare Workers 支援 Python:「Bringing Python to Workers using Pyodide and WebAssembly」。

不過比較特別的是,並不是原生支援 Python 環境,而是透過轉譯成 WebAssembly 丟進 V8 engine 執行,就如同文章標題提到的。

另外是套件的部分,照這個文字的說明,應該不是所有的套件都可以丟進去用 (can import a subnet of popular Python packages),支援的套件看起來是預先 compile 好:

All bindings, including bindings to Vectorize, Workers AI, R2, Durable Objects, and more are supported on day one. Python Workers can import a subset of popular Python packages including FastAPI, Langchain, Numpy and more. There are no extra build steps or external toolchains.

看起來是打算全部都用 javascript 當作基礎?

AWS Lambda 的 cache 架構

Lobsters 上看到的老文章:「[Cache Architecture for] Container Loading in AWS Lambda」,原文從 url 看起來是去年五月發表的資訊了:「Container Loading in AWS Lambda」。

主要是在講 container 怎麼 load 才會儘快執行,首先是提到了大家常用的 layer cache,在 AWS Lambda 上則是改用了 block level cache:

Most of the existing systems do this at the layer or file level, but we chose to do it at the block level.

然後每一塊 512KB:

We unpack a snapshot (deterministically, which turns out to be tricky) into a single flat filesystem, then break that filesystem up into 512KiB chunks.

接著是提到 lazy load 的方式:「Slacker: Fast Distribution with Lazy Docker Containers」:

Our analysis shows that pulling packages accounts for 76% of container start time, but only 6.4% of that data is read.

Slacker speeds up the median container development cycle by 20x and deployment cycle by 5x.

而這個技巧也被用在 AWS Lambda 上,而且是透過 FUSE 實作:

In Lambda, we did this by taking advantage of the layer of abstraction that Firecracker provides us. Linux has a useful feature called FUSE provides an interface that allows writing filesystems in userspace (instead of kernel space, which is harder to work in).

另外一個 AWS Lambda 有實作的是 tiered caching,分成三層,包括了 worker 的 local cache (L1)、同一個 AZ 上的 cache (L2) 以及 S3 上的資料 (L3):

Despite our local on-worker (L1) cache being several orders of magnitude smaller than the AZ-level cache (L2) and that being much smaller than the full data set in S3 (L3), we still get 67% of chunks from the local cache, 32% from the AZ level, and less than 0.1% from S3.

也因為 L3 cache 是 S3 的關係,他們在 L1 與 L2 上就不用擔心 durability 的問題 (反正不見了就往後面找):

The whole set of chunks are stored in S3, meaning the cache doesn’t need to provide durability, just low latency.

但還是用了 Erasure code,儘量維持每個 cache tier 在自己 tier 裡面就可以找到資料的機率,這樣可以盡量降低 peak latency (於是造成 99.9%/99.95%/99.99% 的 SLO 不好看?):

Think about what happens in a classic consistent hashed cache with 20 nodes when a node failure happens. Five percent of the data is lost. The hit rate drops to a maximum of 95%, which is a more than 5x increase in misses given that our normal hit rate is over 99%. At large scale machines fail all the time, and we don’t want big changes in behavior when that happens.

So we use a technique called erasure coding to completely avoid the impact. In erasure coding, we break each chunk up into M parts in a way that it can be recreated from any k. As long as M - k >= 1 we can survive the failure of any node with zero hit rate impact (because the other k nodes will pick up the slack).

大概是本來比較簡單的三層架構在 benchmark 後發現無法達成對應的 SLO,所以就「補上」erasure code 拉高 SLO,從這邊就可以感覺到老闆的要求對於架構設計上的影響...

話說難得看到一些細節被丟出來...

Cloudflare Zaraz 的新價錢

Cloudflare Zaraz 有點像是 GA4Mixpanel 或是 Amplitude 這種產品:

Cloudflare Zaraz is a solution that developers and marketers use to load third-party tools like Google Analytics 4, Facebook CAPI, TikTok, and others.

剛剛看到 Cloudflare Zaraz 的宣佈的新價錢:「Zaraz launches new pricing」,翻資料的時候發現去年七月的時候也有宣佈過價錢,但當時看起來反彈頗大而暫緩:「Cloudflare Zaraz steps up: general availability and new pricing」。

先看一下去年七月當時宣佈的價錢,這邊用了 Zaraz Loads 這個特別的 term:

If you exceed the free Zaraz Loads allocations, you'll be charged $0.50 for every additional 1,000 Zaraz Loads, but the service will continue to function.

看起來 Zaraz Load 包括了 script loading 以及 pageview event 都各算一次:

A Zaraz Load is counted each time a web page loads the Zaraz script within it, and/or the Pageview trigger is being activated.

這樣的話 1M event 就要 US$500,隔壁棚 Mixpanel 可以在「Pricing - Mixpanel | Product Analytics」這邊試拉,同樣 1M event 只要 $140,難怪當時被幹剿到不行...

這次公佈的價錢則是又用了新的 term,叫做 Zaraz Events,這個看起來就是一般理解的 event:

One of the biggest changes we made was changing the metric we used for pricing Zaraz. One Zaraz Event is an event you’re sending to Zaraz, whether that’s a pageview, a zaraz.track event, or similar.

而價錢直接下殺到 1M event 收 US$5,跟之前的差異巨大:

With the new Zaraz pricing model, every Cloudflare account gets 1,000,000 Zaraz Events per month for free. If your account needs more than that, every additional 1,000,000 Zaraz Events are only $5 USD, with volume discounting available for Enterprise accounts.

只是不知道 Zaraz 可以做到什麼程度,「理論上」會比較陽春,但不知道夠不夠用?

Scaleway 的 RISC-V 伺服器

看到「Scaleway launches RISC-V servers (scaleway.com)」這篇,Scaleway 推出了 RISC-V 的伺服器:「Elastic Metal RV1」。

先看對消費者比較有感覺的部分,未稅 €15.99/mo 大約是 US$17.34/mo,有 16GB RAM 這點算是蠻有競爭力的,目前常見的 VPS 大約是 1:5 左右 (1GB RAM 大約要 $4/mo),這邊直接接近到 1:1,光是這點在吃 memory 比較重的環境下就蠻吃香的。

另外從 Scaleway 的角度來看,有蠻多特別的特性,像是超省電與超高密度:

EM-RV1 servers are extremely energy-efficient, consuming between 0.96W and 1.9W per 1.8GHz core.

Incredibly dense, a 52U rack holds up to 672 EM-RV1s!

所以一台機器的 4 core 跑滿大約是 7.6W,看功耗與手機用的 ARM CPU 有點像,只是不知道 CPU 效能到底在哪個區間,等後續看看好了?