一個害我嗆到的故事... (Netlify 帳單的故事?)

故事本身其實還蠻普通的,只是我的閱讀順序害我嗆到...

首先是在 Hacker Newsbest 頁上看到「Netlify just sent me a $104k bill for a simple static site (reddit.com)」這篇,點進去以後是 Reddit 的「Netlify just sent me a $104K bill for a simple static site」這篇,看了一下作者的敘述,是個用 Netlify 的服務,上面有個 3.44MB 的音檔被針對攻擊,造成 190TB 的流量,以及 $104K 的帳單 (十萬多美金),之後 Netlify 的客服同意這是 DDoS 攻擊,給他 95% discount,也就是還是要付 $5K 左右...

Reddit 下面最高分的回應是:

[–]thankyoufatmember 2262 points 14 hours ago
Don't pay, post the story to Hackernews!

Okay,我想說我就是從 Hacker News 上看到點過來的... 回去看一下好了,結果在 Hacker News 的留言最上方是:

bobfunk 10 hours ago | next [–]

Netlify CEO here.

Our support team has reached out to the user from the thread to let them know they're not getting charged for this.

It's currently our policy to not shut down free sites during traffic spikes that doesn't match attack patterns, but instead forgiving any bills from legitimate mistakes after the fact.

Apologies that this didn't come through in the initial support reply.

然後我剛好在喝茶,就嗆到了...

人家常說 Ptt 的電蝦板 (PC_Shopping) 是全台灣最大的客服中心,遇到各種不公不義的問題貼上去就會解決了... 這點倒是頗像的。

Cloudflare Workers 推出新的計費模式:以 CPU time 收費

CloudflareWorkers 這個產品 (serverless 產品線) 推出了以 CPU time 收費的模式:「New Workers pricing — never pay to wait on I/O again」。

在這之前大家都是以 wall time 在計費,但這對於會卡在 I/O 很久的應用來說很不利,這次 Cloudflare 提出方案改用 CPU time 來計費,的確有吸引到我的目光...

這是舊的:

這是新的:

就這兩張比較起來有個不是很確定的部分,現在不看 memory 用量收費了?Pricing 這頁裡面的資量已經把 「Standard」更新上去了,但好像還是沒提到 memory?

不過不是馬上生效,而是這個月的月底 2023/10/31 可以選擇切到新的方案:

Starting October 31, 2023, you will have the option to opt in individual Workers and Pages Functions projects on your account to new pricing, and newly created projects will default to new pricing. You’ll be able to estimate how much new pricing will cost in the Cloudflare dashboard. For the majority of current applications, new pricing is the same or less expensive than the previous Bundled and Unbound pricing plans.

另外明年 2024/03/01 會全部強制切到新的方案:

If you’re on our Workers Paid plan, you will have until March 1, 2024 to switch to the new pricing on your own, after which all of your projects will be automatically migrated to new pricing. If you’re an Enterprise customer, any contract renewals after March 1, 2024, will use the new pricing.

用 CPU time 的確是好不少,但不知道這個 billing 的方式沒有其他地雷...

Amazon Prime Video 捨棄 AWS Step Functions 回頭用 EC2 與 ECS 省錢的文章

昨天在 Hacker News 上熱烈討論的文章,是一篇三月就放出來,但昨天被丟上來意外的熱烈討論,在講 Amazon Prime Video 的團隊改寫程式,把 AWS Step Functions 拔掉,並且回頭用 EC2ECS 而省下大量 AWS 費用的文章討論:「Scaling up the Prime Video audio/video monitoring service and reducing costs (primevideotech.com)」,原文在「Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%」,Internet Archive 的備份Archive Today 的備份

先看文章的部分,裡面提到了他們用 AWS Step Functions,但意外的貴:

The initial version of our service consisted of distributed components that were orchestrated by AWS Step Functions. The two most expensive operations in terms of cost were the orchestration workflow and when data passed between distributed components.

然後改寫程式把所有東西都放在單一 process 裡面跑就好,用標準的 EC2 或是 ECS 就可以 scale 很好,而且也省錢:

To address this, we moved all components into a single process to keep the data transfer within the process memory, which also simplified the orchestration logic. Because we compiled all the operations into a single process, we could rely on scalable Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Container Service (Amazon ECS) instances for the deployment.

可以看出起因是一開始設計時的 overdesign,把可以簡單處理的東西拆開,另外加上雲端在這塊收費特別貴而導致成本爆增... 這件事情偶而會發生,尤其是比較新的東西會沒注意到成本,通常在上線發現不太對的時候就會安排 refactor 掉。

但如果是 Amazon 自家集團的其他團隊出來抱怨,就有很棒的 PR 效果了,所以 Hacker News 上就看到有人在猜可能過不久後文章就會不見 XD (但文章紅了以後應該就不會不見 XD):

My word. I'm sort of gob smacked this article exists.

I know there are nuances in the article, but my first impression was it's saying "we went back to basics and stopped using needless expensive AWS stuff that caused us to completely over architect our application and the results were much better". Which is good lesson, and a good story, but there's a kind of irony it's come from an internal Amazon team. As another poster commented, I wouldn't be surprised if it's taken down at some point.

很政治不正確的文章 XD

以之前的經驗來說,AWS 上類似的東西還包括了 NAT Gateway,這東西只適合在有強資安需求 (像是法規要求),而且需要連外的流量很少的時候適合。

NAT Gateway 在新加坡 ap-southeast-1 要 $0.059/hr (美金,所以大約是 $42.48/mo),以及 US$0.045/GB 的處理費用,所以假設你每天只有 100GB (平均 10Mbps),就等於是 3TB/mo,要 $135/mo。這樣整包就 $172.48/mo 了。

如果讓 EC2 機器直接連去 internet 抓資料的話,這些費用就是 $0,你只要付無論是有 NAT Gateway 或是沒有 NAT Gateway 的 outbound traffic 費用部分 (大多是各種 TCP/TLS/HTTP header)。

比較省成本的解法是用 security group 對 outbound traffic 開放特定的流量來解。

另外一種方式還是 NAT,但是是自己架設 HA 的 NAT service,像是 2015 年的文章「The Right Way to set up NAT in EC2」提到的方法。

這個方法以現在的機種來說,兩台 t4g.nano 的機器加上 EBS 不到 $10/mo,唯一要注意的應該是網路頻寬雖然可以 burst 到 5Gbps,但他的網路頻寬是 credit 機制,當 credit 用完的時候 t4g.nano 記得是剩下 100Mbps 左右?不過真的有這個量的時候機器也可以往上開大一點...

另外還有很多「好用」的雲端服務,但看到帳單後就變得「不好用」的雲端服務... 在用之前先算一下成本就會發現了。

把 AWS 的 Billing 資料接進 Grafana 上...

Twitter 上看到 Grafana 的帳號提到了一篇把 AWS Billing 資料接進 Grafana 上的文章:

這個想法還頗不賴的,有些東西剛好可以交叉比對拉出來...

Mastercard 對實體物品提供免費試用後的訂閱條款

Mastercard 規定在免費試用後 (實體物品),需要另外再讓使用者再同意一次才能開始收訂閱費用:「Free Trials Without The Hassle」。

The rule change will require merchants to gain cardholder approval at the conclusion of the trial before they start billing. To help cardholders with that decision, merchants will be required to send the cardholder – either by email or text – the transaction amount, payment date, merchant name along with explicit instructions on how to cancel a trial.

新聞一開始出來時其實讓蠻多人關注的,因為一堆網路服務都是靠這招... 所以 Mastercard 在文章後更新說明,目前只有實體物品套用這個規則:

*This blog was updated on January 17, 2019 to clarify that the rule change is applicable to physical products such as skincare, healthcare items etc.

AWS 讓你可以禁止 RI 跨帳號計算了...

現在 AWS 讓你可以設定,是否允許 Reserved Instance (RI) 跨帳號使用:「Customize your organization’s AWS credit and Reserved Instance (RI) discount sharing using new billing preferences」。

以往是優先用在自己帳號,但如果有剩的話可以挪去其他帳號用。這樣雖然比較省錢,但有時候會造成帳務的「困擾」:

Historically, AWS has maximized customer savings by applying credits and RI discounts first to the account that owned the credit or RI lease and then distributing the remainder, if any, to qualifying usage incurred by accounts in the same organization. While this approach had the potential of lowering the overall bill, customers were unable to control if, and how, discounts were applied across organizational lines.

現在則是可以關掉:

To provide greater flexibility, customers can now disable AWS credit sharing across all accounts in their organization. This ensures that only the account that owns a credit, or has previously redeemed a credit, receives the associated benefit.

也可以分開設定:

You can also designate a set of accounts for which RI discount sharing is disabled, while continuing to share RI discounts among the rest of the accounts in your organization.

這樣雖然會比較貴,但這其實反應到某些組織文化上的問題啦...

Amazon EC2 的 CRI 支援一年版本了...

Amazon EC2 的 CRI (Convertible Reserved Instance) 支援一年的合約了:「EC2 Convertible Reserved Instance Update – New 1-Year CRI, Merges & Splits」,這樣彈性再多了一些:

Today we are introducing Convertible RIs with a 1-year term, complementing the existing 3-year term.

不過 CRI 主要是用在需要換 family type 的情境下,如果是已知 family type (像是一般性的 worker 會選 C4 或是剛推出的 C5) 那麼就直接選擇 Regional RI 就好...

基本上就是讓財務操作上多個選擇 :o

EC2 與 EBS 十月開始以秒計費

雖然只是 Amazon EC2Amazon EBS 計價模式的改變,但這次 AWS 的改變對於許多開發流程有很大的影響 (重點在 EC2 的部份):「New – Per-Second Billing for EC2 Instances and EBS Volumes」。

10/2 開始改變 (而不是 10/1),低消一分鐘,Windows 機種以及需要額外收費的 Linux 機種不在範圍內:

This change is effective in all AWS Regions and will be effective October 2, for all Linux instances that are newly launched or already running. Per-second billing is not currently applicable to instances running Microsoft Windows or Linux distributions that have a separate hourly charge. There is a 1 minute minimum charge per-instance.

然後 Spot 與買 RI 後也是一樣以秒計價:

List prices and Spot Market prices are still listed on a per-hour basis, but bills are calculated down to the second, as is Reserved Instance usage (you can launch, use, and terminate multiple instances within an hour and get the Reserved Instance Benefit for all of the instances).

這次改變的影響很巨大。馬上可以想到幾個情境...

第一個是對於實踐 Release early, release often 的團隊來說,如果設計成每 deploy 一次就建一個新的 AMI (最乾淨的作法),再開新機器換掉的話,成本就會增加不少。所以對於這樣的團隊,就會偏好朝著替換現有目錄內的東西後重啟...

現在改成以秒計費後,直接透過 Blue-Green Deployment 就可以了 (AWS CodeDeploy 年初也支援了:「AWS CodeDeploy 支援 BlueGreenDeployment」):(如果不熟悉 Blue-Green Deployment 的話,更白話的說法就是「先建後拆」...)

同樣的理由,對於 Auto Scaling 的 policy 也有些改變。之前機器開起來都會想讓他跑一個小時,所以 scale down 的部份都會寫的比較鬆一點。現在就可以重新規劃了...

另外一個影響是對使用 container 的誘因少了不少。很多人用 container 的用法是開大台機器再裡面拆給不同服務用,讓資源利用率變高,現在變成用多少算多少後就不太需要這樣了...

當然也還是有缺點。以前 Spot Instance 如果被 AWS 收回時,最後的那個小時是不計費的。現在因為以秒計費,變成要收費了...

最後是 10/2 生效這件事情頗怪,該不會是財務部門不願意配合 10/1 星期天加班生效,所以只好變成 10/2 生效這種理由吧... XDDD

Netflix 把金流相關的系統轉移到 AWS 上跑 MySQL 的故事...

這次要提的是「Netflix Billing Migration to AWS」、「Netflix Billing Migration to AWS - Part II」與「Netflix Billing Migration to AWS - Part III」這三篇。

Netflix 先前的金流相關系統跑的是 Oracle 的資料庫:

然後換成 MySQL

系統上是採用 DRBD,然後底層是 5 個 4TB 的 EBS 組成的 RAID 0,跑 LVM

High performance with respect to reads and writes was achieved by using RAID0 with EBS provisioned IOPS volumes. To get more throughput per volume, 5 volumes of 4TB each were used, instead of 1 big volume. This was to facilitate faster snapshots and restores.

LVM to manage two Logical Volume’s (DB and DRBD Metadata) within single Volume Group.

可以看到裡面用的都是很經過時間考驗的技術,像是 DRBD、標準的 Replication 架構...