amazon – Page 5 – Gea-Suan Lin's BLOG

AWS 新推出的 m7a 宣稱比 m6a 多 50% 效能？

AWS 在「Introducing Amazon EC2 M7a instances (Preview)」這邊看到 m7a 會比 m6a 快 50% 的宣稱：

These instances deliver up to 50% greater performance on average compared to M6a instances.

目前還是 preview 階段，需要申請才有機會用，所以還不知道他的真實性能是怎麼樣？另外一方面，價錢也還沒查到... 但如果價錢不要漲太多的話，算一下好像有可能跟上 ARM 的 m7g 了？

另外這樣也就蠻值得期待會不會有 t4a？

很多 MTurk 的接案者都用 LLM 在解決文字類的問題

剛剛在 Hacker News 上翻到的：「33-46% of workers on MTurk used LLMs in a text production task (arxiv.org)」，論文在「Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks」這邊，這個標題取的很故意... XD

Hacker News 上的標題主要是出自論文 abstract 的這段：

We reran an abstract summarization task from the literature on Amazon Mechanical Turk and, through a combination of keystroke detection and synthetic text classification, estimate that 33-46% of crowd workers used LLMs when completing the task.

想想還蠻正常的？能輕鬆賺當然就輕鬆賺... 但這也代表開發者可以思考 offload 給 LLM 的品質，以及如果需要外部的工人智慧，是不是可以搭配 LLM 再 offload 一些簡單的處理給人類就好？

話說好久沒聽到 MTurk 這個服務了，翻了 wiki 看起來是 2005 年就有的服務。

AWS Aurora Xanadu？

在「Why PostgreSQL High Availability Matters and How to Achieve It (yugabyte.com)」裡面看到 AWS 也在研發類似 GCP 提供的 Spanner 的服務，計畫名稱叫做 Aurora Xanadu：「36328981」。

franckpachot

Google has Spanner. AWS is working on something similar (project Aurora Xanadu). And both have YugabyteDB in their marketplace. Those are Distributed SQL (Global ACID), not Citus. For DataWarehouse which doesn't need ACID, there are other services.

也先把這個連結備份起來，看看後面是不是直接拿這個名字來用？

AWS 把 Falcon 40B 丟進 SageMaker 服務了

AWS 宣布在 SageMaker JumpStart 裡面可以用 Falcon 40B 了：「Falcon 40B foundation model from TII available on SageMaker JumpStart」，不愧是 AWS...

話說 llama.cpp 對 Falcon 的支援好像有點卡關的感覺，大概還要再折騰一陣子吧，雖然有些人已經能跑 7B 了，但大家還是想跑 40B 看看...：「Falcon LLM Support #1602」。

GCP 的 Disks 與 AWS 的 EBS 的比較...

下午在升級 GCP 上面的跳板機的時候，發現機器用的是 Standard Persistent Disk (Standard PD)，這是個 HDD 架構，跑起來超慢，研究了一下發現 AWS 與 GCP 兩邊的差異其實有點大，整理一下...

價錢的部分，AWS 的部分拿東京區 (ap-northeast-1) 的價錢來看，GCP 則是拿台灣區 (asia-east1) 來看。

先看 SSD 的部分：

AWS 最常用的 gp3 是 $0.096/GB，無論空間大小，效能上都提供 3000 IOPS 與 125MB/sec throughput，另外可以加價購買 IOPS 與 throughput。不過也因為這個性質，拿來當開機碟很好用。

早期的 gp2 則是 $0.12/GB，效能上提供 3 IOPS/GB，但最低會給 100 IOPS，所以當開機碟也還可以，不會到太慢。

GCP 如果是 Balanced Persistent Disk (Balanced PD) 是 $0.1/GB，效能上會提供 6 Read IOPS/GB + 6 Write IOPS/GB + 0.28MB/sec/GB throughput；以 10GB 的 disk 來說會是 60 Read IOPS + 60 Write IOPS + 2.8MB/sec throughput。

如果是 SSD Persistent Disk (SSD PD) 是 $0.17/GB，效能上是 30 Read IOPS/GB + 30 Write IOPS/GB + 0.48MB/sec/GB throughput；以 10GB 的 disk 來說會是 300 Read IOPS + 300 Write IOPS + 28MB/sec throughput。

再來是 HDD 的部分：

AWS 這邊代號是 standard，價錢是 $0.08/GB，另外 IOPS 每 1M 個 IOPS 也要收 $0.08，如果是拿來開機的話還好，但如果是有應用在上面操 IOPS 的話就不太便宜了。

GCP 這邊是 Standard Persistent Disk (Standard PD)，價錢是 $0.04/GB，效能上提供 0.75/GB Read IOPS + 1.5/GB Write IOPS + 0.12MB/sec/GB throughput；以 10GB 的 disk 來說會是 7.5 Read IOPS + 15 Write IOPS + 1.2MB/sec throughput。

所以如果是不太在意效能的情況下要找 C/P 值 (但也不到完全不在意？)，在 AWS 上用 standard 就不太划算，畢竟多一些些費用就可以用 gp3，對效能提升巨大；但在 GCP 上就會想用 Standard PD，從單價可以看到差了蠻多...

EC2 Spot Instance 價錢的上漲趨勢

在「Farewell to the Era of Cheap EC2 Spot Instances」這邊討論了 Amazon EC2 的 spot instance 最近有上漲的趨勢，像是這張應該是從 web console 拉出來 us-east-1 的 t4g.nano 趨勢：

有不少 region 都有類似的情況，尤其是最常用的 us-east-1 與 us-west-2：

上個月 Plurk 的朋友也有聊到類似的情況，在 us-east-1 上愈來愈難找到便宜的 spot instance 機器了，當時還在想是不是有什麼大型活動，但文章出來後才發現大家都有遇到類似的情況。

另外在 Hacker News 上面也有討論：「Farewell to the Era of Cheap EC2 Spot Instances (pauley.me)」，裡面是有提到了一些工具可以再更彈性的調整，用更多邏輯改善成本，像是 AutoSpotting - Community Edition 這個專案用 lambda 幫你調整：

The entire logic described above is implemented in a set of Lambda functions deployed using CloudFormation or Terraform stacks that can be installed and configured in just a few minutes.

回頭來看一下目前的情況 (以及猜測 AWS 的策略)，如果 spot instance 的常態價錢維持在牌價的六七成，等於是逼你規劃用 Savings Plans 之類的方案，然後讓 spot instance 慢慢退場。

話說回來，接下來不知道會不會有人去告 90% saving 的廣告宣傳...

另外一則跟 Prime Video 有關的有趣留言

在 Hacker News 上的「Even Amazon can't make sense of serverless or microservices (world.hey.com)」這邊看到 DHH 抓到機會剛好戳了 AWS 的費用問題，不過讓我注意到的不是 DHH 的文章，而是在 Hacker News 上的留言 35823366，自稱是當時設計這個 serverless 架構的人出來解釋當時的確有壓力測試過，看起來也還 OK：

I actually designed the original serverless system (a few years back when I was still at Prime Video), and yes we did and it did sort of look like it could work until it didn't. Obviously wasn't the right solution for the scale we had in mind (or rather the type of problem we were working on) but it's sad to see the mistake of one team be used to justify shitting on serverless as a general solution.

但這是全新帳號的發言，拿這個 id 去找是可以翻到 TimboKZ 這個 GitHub 帳號，然後一路找也可以看到 Timur Kuzhagaliyev 的 LinkedIn 帳號，這個帳號裡面有提到 2019 到 2020 的時候在 Amazon 裡面當 SDE：

Worked on various computer vision projects as a part of Video Quality Analysis team at Prime Video.

是都符合沒錯，但還是不能完全肯定，目前看八卦的心態還是比較多...

Amazon.com CTO 解釋 Prime Video 的那篇文章

先前在「Amazon Prime Video 捨棄 AWS Step Functions 回頭用 EC2 與 ECS 省錢的文章」這篇裡面有提到 Prime Video 團隊寫的文章，有點負面 PR 的感覺。

而讓人意外的是 Amazon.com 的 CTO Werner Vogels 也寫了一篇文章提到這件事情：「Monoliths are not dinosaurs」。

從標題可以看到他是針對 monolith 架構以及其他的架構在討論，而不是在講 AWS 成本問題 (這反而是原本 Prime Video 那篇文章在標題宣傳的)，另外也講了一些管理面上的東西，也就是文章裡面的副標題，以及兩段粗體字：

Building evolvable software systems is a strategy, not a religion. And revisiting your architectures with an open mind is a must.

If you hire the best engineers, you should trust them to make the best decisions.

there is not one architectural pattern to rule them all.

至於算不算是提油... 就見仁見智了。

Amazon Prime Video 捨棄 AWS Step Functions 回頭用 EC2 與 ECS 省錢的文章

昨天在 Hacker News 上熱烈討論的文章，是一篇三月就放出來，但昨天被丟上來意外的熱烈討論，在講 Amazon Prime Video 的團隊改寫程式，把 AWS Step Functions 拔掉，並且回頭用 EC2 與 ECS 而省下大量 AWS 費用的文章討論：「Scaling up the Prime Video audio/video monitoring service and reducing costs (primevideotech.com)」，原文在「Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%」，Internet Archive 的備份、Archive Today 的備份。

先看文章的部分，裡面提到了他們用 AWS Step Functions，但意外的貴：

The initial version of our service consisted of distributed components that were orchestrated by AWS Step Functions. The two most expensive operations in terms of cost were the orchestration workflow and when data passed between distributed components.

然後改寫程式把所有東西都放在單一 process 裡面跑就好，用標準的 EC2 或是 ECS 就可以 scale 很好，而且也省錢：

To address this, we moved all components into a single process to keep the data transfer within the process memory, which also simplified the orchestration logic. Because we compiled all the operations into a single process, we could rely on scalable Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Container Service (Amazon ECS) instances for the deployment.

可以看出起因是一開始設計時的 overdesign，把可以簡單處理的東西拆開，另外加上雲端在這塊收費特別貴而導致成本爆增... 這件事情偶而會發生，尤其是比較新的東西會沒注意到成本，通常在上線發現不太對的時候就會安排 refactor 掉。

但如果是 Amazon 自家集團的其他團隊出來抱怨，就有很棒的 PR 效果了，所以 Hacker News 上就看到有人在猜可能過不久後文章就會不見 XD (但文章紅了以後應該就不會不見 XD)：

My word. I'm sort of gob smacked this article exists.

I know there are nuances in the article, but my first impression was it's saying "we went back to basics and stopped using needless expensive AWS stuff that caused us to completely over architect our application and the results were much better". Which is good lesson, and a good story, but there's a kind of irony it's come from an internal Amazon team. As another poster commented, I wouldn't be surprised if it's taken down at some point.

很政治不正確的文章 XD

以之前的經驗來說，AWS 上類似的東西還包括了 NAT Gateway，這東西只適合在有強資安需求 (像是法規要求)，而且需要連外的流量很少的時候適合。

NAT Gateway 在新加坡 ap-southeast-1 要 $0.059/hr (美金，所以大約是 $42.48/mo)，以及 US$0.045/GB 的處理費用，所以假設你每天只有 100GB (平均 10Mbps)，就等於是 3TB/mo，要 $135/mo。這樣整包就 $172.48/mo 了。

如果讓 EC2 機器直接連去 internet 抓資料的話，這些費用就是 $0，你只要付無論是有 NAT Gateway 或是沒有 NAT Gateway 的 outbound traffic 費用部分 (大多是各種 TCP/TLS/HTTP header)。

比較省成本的解法是用 security group 對 outbound traffic 開放特定的流量來解。

另外一種方式還是 NAT，但是是自己架設 HA 的 NAT service，像是 2015 年的文章「The Right Way to set up NAT in EC2」提到的方法。

這個方法以現在的機種來說，兩台 t4g.nano 的機器加上 EBS 不到 $10/mo，唯一要注意的應該是網路頻寬雖然可以 burst 到 5Gbps，但他的網路頻寬是 credit 機制，當 credit 用完的時候 t4g.nano 記得是剩下 100Mbps 左右？不過真的有這個量的時候機器也可以往上開大一點...

另外還有很多「好用」的雲端服務，但看到帳單後就變得「不好用」的雲端服務... 在用之前先算一下成本就會發現了。

測試 GitHub 與 AWS 可以多快偵測出外洩的 token

前幾天在 Hacker News 上看到「What happens when you leak AWS credentials and how AWS minimizes the damage (xebia.com)」這篇，原文連結在「What happens when you leak AWS credentials and how AWS minimizes the damage」這邊。原文跟 Hacker News 上的寫到東西都頗有趣的，可以分開來講。

先是原文的部份，他抓了一些時間軸：

12:33:12 – Pushed the credentials to GitHub
12:34:19 – The AWSCompromisedKeyQuarantineV2 policy is attached to the IAM user test-user by AWS
12:34:32 – Various List and Describe calls are made using the leaked credentials
12:35:08 – Received an email from AWS with the subject ‘ACTION REQUIRED: Your AWS Access Key is Exposed for AWS Account 12345678’

可以看到推上 GitHub 後，AWS 在一分七秒後就自動加上 AWSCompromisedKeyQuarantineV2 以減少災害擴大，然後再發信件通知。

這個功能可以參考 GitHub 的文件說明：「About secret scanning - GitHub Docs」。

另外在 Hacker News 上面看到有人直接把 secret scanning 當作 API 串來用 (噗)，把在 PyPI 上面掃到的 AWS secret 丟上 GitHub 觸發後續的機制：

I set up a project[1] to automatically leak AWS secrets published to the Python package index, which then triggers the secret scanning process and quarantines the keys[2]

1. https://github.com/pypi-data/pypi-aws-secrets

2. https://github.com/pypi-data/pypi-aws-secrets/blob/main/keys...