Cavium (被 Marvell 併購) 在 Snowden leak 中被列為 SIGINT "enabled" vendor

標題可能會有點難懂,比較簡單的意思就是在 Snowden 當年 (2013) 洩漏的資料裡面發現了不太妙的東西,發現 Cavium (現在的 Marvell) 的 CPU 有可能被埋入後門,而他們家的產品被一堆廠商提供的「資安產品」使用。

出自 X (Twitter) 上面提到的:

這段出可以從 2022 年的「Communication in a world of pervasive surveillance」這份文件裡面找到,就在他寫的 page 71 (PDF 的 page 90) 的 note 21:

While working on documents in the Snowden archive the thesis author learned that an American fabless semiconductor CPU vendor named Cavium is listed as a successful SIGINT "enabled" CPU vendor. By chance this was the same CPU present in the thesis author’s Internet router (UniFi USG3). The entire Snowden archive should be open for academic researchers to better understand more of the history of such behavior.

Ubiquiti 直接中槍...

而另一方面,在 Hacker News 上的討論「Snowden leak: Cavium networking hardware may contain NSA backdoor (twitter.com/matthew_d_green)」就讓人頭更痛了,像是當初 Cavium 就有發過新聞稿提到他們是 AWS CloudHSM 的供應商:「Cavium's LiquidSecurity® HSM Enables Hybrid Cloud Users to Synchronize Keys Between AWS CloudHSM and Private Clouds」。

而使用者也確認有從 log 裡面看到看到 Cavium 的記錄:

Ayup. We use AWS CloudHSM to hold our private signing keys for deploying field upgrades to our hardware. And when we break the CI scripts I see Cavium in the AWS logs.

Now I gotta take this to our security team and figure out what to do.

居然是 CloudHSM 這種在架構上幾乎是放在 root of trust 上的東西...

IPv6 Excuse Bingo (IPv6 理由伯賓果?)

在「AWS IPv4 Estate Now Worth $4.5B (toonk.io)」這邊的討論意外的看到「IPv6 Excuse Bingo」這個網站...

這個 bingo 是動態的,每次 reload 都會有不同的版本出來,理由超多...

不過實際用 IPv6 network 後會發現各種鳥問題真的多,之前 (到現在) 最經典的就是 HECogent 的 IPv6 network 因為錢的問題談不攏而不通的問題,在維基百科上面就有提到從 2009 年開始就不通了:

There is a long-running dispute between the provider Cogent Communications and Hurricane Electric. Cogent has been refusing to peer settlement-free with Hurricane Electric since 2009.

所以夠大的服務如果要弄 IPv6 都得注意到這點,像是 CDN 或是 GeoIP-based load balancer 就不能把 Cogent 的用戶導到 HE 的位置上面,反之亦然。

不過話說 AWS 手上有 128M 個 IPv4 address,整個 IPv4 address 的空間也才 4.29B,也就是說光 AWS 手上就有大約 3% 的 IPv4 address 空間,如果扣掉不可用的區段的話就更高了...

EC2-Classic 完全退役

Amazon 家的老大 (CTO & VP) Werner Vogels 貼了關於 EC2-Classic 完全退役的文章:「Farewell EC2-Classic, it’s been swell」。

2021 年的時候 AWS 的 Jeff Barr 宣布了 EC2-Classic 的退役計畫:「EC2-Classic Networking is Retiring – Here’s How to Prepare」,我當時也整理了「AWS 宣佈 EC2-Classic 退役的計畫」這篇。

當時的時間表是期望在 2022/08/15 全部退役:

On August 15, 2022 we expect all migrations to be complete, with no remaining EC2-Classic resources present in any AWS account.

但後來還是晚了整整一年,到 2023/08/15 (剛好晚了一年) 才全部退役:

On August 15, 2023, we shut down the last instance of Classic.

而公告上面的更新則是在 2023/08/23 更新:

Update (August 23, 2023) – The retirement announced in this blog post is now complete. There are no more EC2 instances running with EC2-Classic networking.

因為真的太久沒用了,看了 Werner Vogels 的描述才能回想起來當時的架構,似乎是有一大鍋這件事情,靠 security group 拆開大家:

When we launched EC2 in 2006, it was one giant network of 10.2.0.0/8. All instances ran on a single, flat network shared with other customers. It exposed a handful of features, like security groups and Public IP addresses that were assigned when an instance was spun up.

順便提一下 Werner Vogels 文章的開頭提到了 AWS 很少將服務退役,即使是 2007 年推出的 Amazon SimpleDB 也還是繼續在跑,即使現在主推的是 DynamoDB

Retiring services isn’t something we do at AWS. It’s quite rare. Companies rely on our offerings – their businesses literally live on these services – and it’s something that we take seriously. For example SimpleDB is still around, even though DynamoDB is the “NoSQL” DB of choice for our customers.

用「List of AWS Services Available by Region」這頁查了一下,SimpleDB 的區域意外的還不少,在 us-east-1us-west-1us-west-2ap-southeast-1ap-southeast-2ap-northeast-1eu-west-1 以及 sa-east-1

不過話說這開頭是不是在偷臭隔壁棚 XDDD

Amazon SES 寄到 Gmail 受到阻擋的情況

我自己沒遇過,但是 Hacker News 上看到有人有遇到,所以記錄起來:「Tell HN: Gmail rate limiting emails from AWS SES」。

Amazon SES 預設是共用 IP pool,所以遇到這種情況不算太意外,但應該是暫時性的,不過發問的作者有提到後來的解法是花 US$25/mo 使用 Dedicated IP 解決 IP reputation 的問題 (在 id=37177533 這邊):

Thanks you all for comments. I have made a decision to subscribed to dedicated IPs (credits: @slau).

The differentiating factor between our current AWS SES plan and the competitors (mentioned in the comments) is having a dedicated IP. With our current volume, none of the competitors are anyway near AWS SES costs. So, moving to a dedicated IPs thats cost 25$ extra not only solves our issue, but also no change in code/infrastructure.

記得以前另外一個教訓是,寄信還是儘量用 IPv4 address 去寄,因為 IPv6 address 的 reputation 得養頗久... 不過這個也是很久前的事情了。

AWS 弄出了 AWS Dedicated Local Zones,很像 AWS Outposts...

AWS 推出了 AWS Dedicated Local Zones:「Announcing AWS Dedicated Local Zones」。

先講 AWS Outposts,他就是提供 AWS 自己的硬體,放到用戶的機房裡面,所以依照需求有不同大小的機器,甚至是整個機櫃:

AWS Outposts is a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience. Outposts solutions allow you to extend and run native AWS services on premises, and is available in a variety of form factors, from 1U and 2U Outposts servers to 42U Outposts racks, and multiple rack deployments.

在「What is AWS Outposts?」這邊有詳細列出有哪些服務可以跑在上面,可以看到主要就是基礎服務,以及一些吃 local 特性的服務。

另外在「How AWS Outposts works」這邊可以看出架構上會在同一個 VPC 裡面,但是不屬於同一個 AZ 下:

而這次推出的 Dedicated Local Zones 還是有些地方沒看懂跟 AWS Outposts 差在哪裡,看起來很像是重新包裝而已...

首先是首頁提到的,這邊有提到 AWS Nitro System,所以猜測這是 AWS 的硬體,而不是自己的硬體:

Build with AWS managed secure cloud infrastructure

Benefit from the same AWS security standards that apply to AWS Regions and AWS Local Zones and are delivered with the security of the AWS Nitro System to help ensure confidentiality and integrity of customer data.

另外在公告裡面提到的服務,跟 Outposts 有些差異:

AWS services, such as Amazon EC2, Amazon Virtual Private Cloud (Amazon VPC), Amazon Elastic Block Store (Amazon EBS), Elastic Load Balancing (ELB), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and AWS Direct Connect are available in Dedicated Local Zones.

另外在「AWS Dedicated Local Zones FAQs」這邊則試著說明兩者差異,但就這些句子看起來,只是不同面向的東西:

Q: How are AWS Dedicated Local Zones different from AWS Outposts?

AWS Outposts is designed for workloads that need to remain on-premises due to latency requirements, where customers want those workloads to run seamlessly with their other workloads in AWS. AWS Outposts racks are fully managed and configurable compute and storage racks built with AWS-designed hardware that allow customers to run compute and storage on-premises, while seamlessly connecting to AWS’s broad array of services in the cloud.

AWS Dedicated Local Zones are designed to eliminate the operational overhead of managing on-premises infrastructure at scale. Some customers have long-term, complex cloud migration projects and need infrastructure that seamlessly scales to support their large-scale demand. Some of these customers represent the interests of a customer community and also need multi-tenancy features to efficiently coordinate across their stakeholders. Dedicated Local Zones enable these customers to reduce the administrative burden of managing their own infrastructure on-premises with scalable, resilient, and multitenant cloud infrastructure that is fully AWS-managed and built exclusively for their use.

另外回到首頁看使用單位,目前是 GovTech Singapore,看起來就是重新包裝?

另外一個猜測是在客戶的機器上面裝 AWS Nitro System,然後裝 AWS 的軟體?這就有點怪了,而且這樣相容性之類的問題也頗麻煩,也許要指定配合的機種?

等有機會遇到的時候再跟 AWS 的人問問看好了,目前也還用不到...

Amazon EBS 十五週年,以及一些數據

AWS 的 SVP James Hamilton 寫了一篇「Amazon Elastic Block Store at 15 Years」在講 Amazon EBS 的十五週年,裡面提到了一些數字。

目前的每天的 IOPS 是 100 trillion,如果攤平的話大約是 11.57 billion IOPS/sec,如果很單純以目前高階 NVMe 卡大約是 1M IOPS/sec 這個數量級來算的話,在沒有任何 redundancy 架構,需要的量大約是萬張?以 AWS 的量感覺好像是個合理的數字... 考慮到 IOPS 主力應該是 SSD 或 NVMe 類的應用,加上 redundancy 以及保留 burst 空間的架構,最少有個十萬張... 應該不算有問題。

I asked the EBS team to quantify customer usage in 2023, the 15th year of EBS. Focusing first on daily usage, EBS delivers more than 100 trillion input/output operations per day.

另外一個是傳輸量,每天有 13EB,攤平大約是 150.46TB/sec,如果用上面提到的十萬顆來攤的話大約是需要 1.5GB/sec 的速度,拿數量級來算應該是差不多。

Perhaps even more staggering is the fact that EBS transfers more than 13 exabytes of data for customers every day.

另外一個是百萬客戶 (也許是帳號) 每天會開出三億個 EBS storage,我猜這跟機器的起起落落有關,現在 EC2 開機主要都是要掛 EBS 的 boot disk 了:

Continuing to focus on daily usage, millions of customers use EBS daily, and these millions of customers create more than 390 million EBS storage volumes each day.

的確如同 James Hamilton 說的,EBS 現在已經變成一個蠻重要的基礎建設了,很多 AWS 上的服務都是架在他上面,像是 RDS 利用了 EBS 的 block replication 組出了 readonly repica,而非走傳統的 replication 路子。

Amazon EC2 推出 m7a 系列的機種

上一篇完全讀錯段落了,重寫...

Amazon EC2 推出了新的 m7a 的機種:「New – Amazon EC2 M7a General Purpose Instances Powered by 4th Gen AMD EPYC Processors」。

號稱與 m6a 相比有 50% 效能上的提升:

Today, we’re announcing the general availability of new, general purpose Amazon EC2 M7a instances, powered by the 4th Gen AMD EPYC (Genoa) processors with a maximum frequency of 3.7 GHz, which offer up to 50 percent higher performance compared to M6a instances.

不過查了一下價錢,us-east-1m6a.large 是 $0.0864/hr,m7a.large 則是 $0.11592/hr (都是 2 vCPU + 8GB RAM),漲了 34% 左右,如果計算 price performance 的話大約是 10%~15%?的確是不高所以不提 price performance,不過這次 m7a 提供了更小台的 m7a.medium (1 vCPU + 4GB RAM) 來補這塊 (m6a 最小的是 m6a.large),$0.05796/hr。

這樣看起來新的機種對於需要單核效能的應用應該會不錯?

再來是可以租到的區域,目前看起來只有歐美的傳統大區有,亞洲區還要再等等:

Amazon EC2 M7a instances are now available today in AWS Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), and EU (Ireland).

Mountpoint for Amazon S3 正式推出 (GA) 了...

三月的時候提到 AWS 搞出了自己的 Amazon S3FUSE 實作:「AWS 官方推出了自己的 Amazon S3 FUSE 套件」,現在 GA 了:「Mountpoint for Amazon S3 – Generally Available and Ready for Production Workloads」。

看起來 s3fs-fuse 還是一直有在更新,然後翻了翻好像沒看到兩者的比較... (可能是之前 Mountpoint for Amazon S3 在 alpha 版的關係?)

記得這個功能拿來塞些東西還蠻好用的...

Amazon EC2 推出新世代 Intel CPU,以及奇怪的 Flex 產品

Jeff Barr 公告了 Amazon EC2 推出了新的 Intel CPU 產品線:「New Seventh-Generation General Purpose Amazon EC2 Instances (M7i-Flex and M7i)」。

先講 m7i,這個比較好理解,就是 Intel 新的 CPU,然後很隱晦的只宣稱了「比其他雲端廠商的 Intel CPU 快 15%」:

Today we are launching Amazon Elastic Compute Cloud (Amazon EC2) M7i-Flex and M7i instances powered by custom 4th generation Intel Xeon Scalable processors available only on AWS, that offer the best performance among comparable Intel processors in the cloud – up to 15% faster than Intel processors utilized by other cloud providers.

而與前一代的 Intel 機種相比 (應該是 m6i?) 則是高了 15% 的 CP 值:

The M7i instances are available in nine sizes (with two size of bare metal instances in the works), and offer 15% better price/performance than the previous generation of Intel-powered instances.

另外這次最有趣 (但未必好用) 的是推出了 m7i-flex,宣稱再「很多情境下」比 m6i 的 CP 值高了 19%:

M7i-Flex instances are available in the five most common sizes, and are designed to give you up to 19% better price/performance than M6i instances for many workloads.

這邊有提到 m7i-flex 是把 m7i 砍了 5% 價錢:

The M7i-Flex instances are a lower-cost variant of the M7i instances, with 5% better price/performance and 5% lower prices. They are great for applications that don’t fully utilize all compute resources. The M7i-Flex instances deliver a baseline of 40% CPU performance, and can scale up to full CPU performance 95% of the time.

但在「General purpose instances」這邊比較清楚,但可以跑超過 40% CPU 的時間限制在 24 小時內的 95%:

For times when workloads need more performance, M7i-flex instances provide the ability to exceed baseline CPU and deliver up to 100 percent CPU for 95 percent of the time over a 24-hour window.

這個設計頗微妙的,旁邊 ARM-based 的 t4g 就先不提了,至少不是 drop-in replacement...

m7i.large (2 vCPU + 8GB RAM) 在 us-east-1 的價錢是 $0.1008/hr,你可以完全沒有阻礙的用他 100% 的 CPU。

m7i-flex.large 是 $0.09576/hr,剛好省 5%,代價是有 5% 的時間必須壓在 40% 以下。

而拿 t3 系列的機器來比較,t3.large 也是有 2 vCPU + 8GB RAM,但他內建的 baseline CPU 是 30%,價錢則是 $0.0832/hr。

從這邊可以看出來,大多數的小應用還是會往 t3 甚至是 t3a 丟。

只有 24 小時的平均 loading 超過 40%,但又不是 24 小時都超過 40% 的應用,也就是現在應該是在 m6am6i 上面跑的,才有可能會評估 m7i-flex

更不用說 AWS 上 m6a 的收費比 Intel 的 m6i 少了 10% 啊,這個產品定位在細算後很微妙:應該是有他可以出現的地方,但怎麼算都不多... 有點像是 AWS 跟 Intel 交代的產品?

但順著邏輯,這個方法其實是 billing-based 的方案,跟技術沒有太多關係,如果 Intel 可以做,那麼 AMD 與 ARM 應該也遲早會出現?

看起來像是 t 系列產品的延伸,但好像可以再等等,看會不會在 AMD 與 ARM 的產品線上推出類似的東西?

Amazon SQS 提高 FIFO throughput 限制

在「Amazon SQS announces increased throughput quota for FIFO High Throughput mode」這邊看到 AWS 提高了 Amazon SQS 中 FIFO throughput 的限制,這本來是個常常有的公告,但讓我意外的是不同區域拉高的數量是不同的:

Amazon Simple Queue Service (SQS) announces an increased quota for a high throughput mode for FIFO queues, allowing you to process up to 9,000 transactions per second, per API action in US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (Frankfurt) regions. For Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo) regions, the throughput quota has been increased to 4,500 transactions per second, per API action. For all other regions where SQS is generally available today, the quota for high throughput mode quota has been increased to 2,400 transactions per second.

第一梯隊的 (像是 us-east-1us-west-2eu-west-1) 都是 9000 tps,而第二梯隊是 4500 tps,沒列在上面的區域是 2400 tps。

另外一個比較特別的是 Frankfurt 區居然在第一梯隊...