Amazon EFS 效能提昇的一些討論

上一篇「Amazon EFS 的效能提昇」提到 Amazon EFS 的效能提昇,在 Hacker News 上看到 Amazon EFS 團隊的 PMT (Product-Manager-Technical) 出來回一些東西:「Amazon Elastic File System Update – Sub-Millisecond Read Latency (amazon.com)」,搜尋 geertj 應該就可以看到他回的東西了...

像是即使是 Jeff Barr 發表這篇文章,也還是經過 legal team 的同意才能發表:

(PMT on the EFS team).

Yes, the wordings are carefully formulated as they have to be signed off by the AWS legal team for obvious reasons. With that said, this update was driven by profiling real applications and addressing the most common operations, so the benefits are real. For example, a simple WordPress "hello world" is now about 2x as fast as before.

另外這次的效能提昇是透過 cache 層達成的:

I'm the PMT for this project in the EFS team. The "flip the switch" part was indeed one of the harder parts to get right. Happy to share some limited details. The performance improvement builds on a distributed consistent cache. You can enable such a cache in multiple steps. First you deploy the software across the entire stack that supports the caching protocol but it's disabled by configuration. Then you turn it for the multiple components that are involved in the right order. Another thing that was hard to get right was to ensure that there are no performance regressions due to the consistency protocol.

然後在每個 AZ 都有 cache:

The caches are local to each AZ so you get the low latency in each AZ, the other details are different. Unfortunately I can't share additional details at this moment, but we are looking to do a technical update on EFS at some point soon, maybe at a similar venue!

另外看起來主要就是 metadata cache 的幫助:

NFS workloads are typically metadata heavy and highly correlated in time, so you can achieve very high hit rates. I can't share any specific numbers unfortunately.

還是有很多細節數字不能透漏,但知道是透過 cache 達成的就已經可以大致上想像後面是怎麼弄出來的了...

Amazon EFS 的效能提昇

AWS 宣佈他們將 Amazon EFS 的 latency 大幅降低以提昇效能:「Amazon Elastic File System Update – Sub-Millisecond Read Latency」。

Linux 上一般是用 NFS 掛 EFS,個位數的 ms 的確對於效能影響超大,現在宣稱讀取的部份降到 0.6ms,應該會有蠻明顯的感覺:

Up until today, EFS latency for read operations (both data and metadata) was typically in the low single-digit milliseconds. Effective today, new and existing EFS file systems now provide average latency as low as 600 microseconds for the majority of read operations on data and metadata.

然後不另外收費:

This performance boost applies to One Zone and Standard General Purpose EFS file systems. New or old, you will still get the same availability, durability, scalability, and strong read-after-write consistency that you have come to expect from EFS, at no additional cost and with no configuration changes.

另外就是過去幾個禮拜他們把現有的 EFS 都轉移過去了:

We “flipped the switch” and enabled this performance boost for all existing EFS General Purpose mode file systems over the course of the last few weeks, so you may already have noticed the improvement. Of course, any new file systems that you create will also benefit.

不過 EFS 另外一個問題就是貴炸,用錢換方便...

Amazon EFS 提供 Replication 功能

Jeff Barr 在官方 blog 上宣佈 Amazon EFS 提供 replication 功能:「New – Replication for Amazon Elastic File System (EFS)」。

可以看到跨區的設定畫面:

在建起來以後會是 read-only filesystem:

另外有提供 fail-over 機制,當 fail-over 過去後會從 read-only 變成 read-write。

不過要注意,架構上屬於 eventually consistent,預期是一分鐘內會更新。這點算是可以預期的,不然 latency 會太高:

All replication traffic stays on the AWS global backbone, and most changes are replicated within a minute, with an overall Recovery Point Objective (RPO) of 15 minutes for most file systems.

然後 replication 不會計算到 I/O 的 credit 與 throughput,算是比較特別的一點:

Replication does not consume any burst credits and it does not count against the provisioned throughput of the file system.

replication 這個服務本身不另外收費,只收取 EFS 使用的空間以及 replication 產生的頻寬費用:

You pay the usual storage fees for the original and replica file systems and any applicable cross-region or intra-region data transfer charges.

在 ZFS 上跑 PostgreSQL 的調校

在「Everything I've seen on optimizing Postgres on ZFS」這邊看到如果要在 ZFS 上面跑 PostgreSQL 時的調校方式,看起來作者有一直在更新這篇,所以需要的時候可以跑去看...

主要的族群是要搞 self-hosted PostgreSQL 的人,相較於 ext4 或是 XFS,底層如果使用 ZFS 可以做許多事情,像是 compression 與 snapshot,這對於很多 DBA 相關的操作會方便不少,但也因為 ZFS 的關係,兩邊 (& PostgreSQL) 需要一起調整以確保效能...

不過短期應該還是用 RDS 就是了...

MySQL 跑在 ZFS 與 ext4 的效能差異

Percona 的「MySQL/ZFS Performance Update」這篇又對 ZFS 做了一次測試,算是用比較新的軟體跑出來的結果,不過要注意這邊的 ZFS 版本仍然不是目前最新版:

ZFS 0.8.6-1 is not bleeding edge, there have been more than 1700 commits since and after 0.8.6, the ZFS release number jumped to 2.0. The big addition included in the 2.0 release is native encryption.

機器是在雲端上 (Azure 上),不熟悉 Azure 的機種,但看記憶體與 CPU 的量好像不是用頂規的機器:

benchmark host
Standard D2ds_v4 instance
2 vCpu, 8GB of Ram and 75 GB of temporary storage
Debian Buster

Database host
Standard E4-2ds-v4 instance
2 vCpu, 32GB of Ram and 150GB of temporary storage
256GB SSD Premium (SSD Premium LRS P15 – 1100 IOPS (3500 burst), 125 MB/s)
Debian Buster
Percona server 8.0.22-13

跑出來的結果看起來不差:

看了一下測試用的設定,似乎只測了 compression 的部份,沒測 snapshot 以及其他功能會對效能有什麼影響,但至少基本盤應該是還不錯?

Amazon EFS 推出 One Zone 版本

Amazon EFS 提供 One Zone 的版本,用較低的可靠度提供更低的價錢:「New – Lower Cost Storage Classes for Amazon Elastic File System」。

價錢大約是 53 折,不過要注意不在同一個 AZ 時使用會有頻寬費用:

Standard data transfer fees apply for inter-AZ or inter-region access to file systems.

目前想的到的是 /net/tmp 這類的用途,資料掉了也就算了,考慮到可靠度,其他的用途好像暫時想不到...

用 flock 防止指令同時間重複執行

flock(1) 是個好用的工具,可以避免指令同時重複執行。基本的用法是:

flock /tmp/example.lock /usr/bin/example

從這邊的 /tmp/example.lock 可以看出是透過檔案系統層的鎖實做的,也因為這樣的設計,所以用在 NFS 上時,可以規劃出跨機器的鎖定機制。(不過要注意舊版的 NFSv2/NFSv3 是可以刻意不跑 lock daemon 的,這樣就爛掉了 XD)

配合 crontab 執行時,-n 也是常用的選項,用在「這次沒跑也沒關係」的東西上,通常用在是計算出來後丟到 cache 內的程式,跑比較久的時候就跳過這次...

AWS 宣佈提昇 Amazon EFS 的最低效率

AWS 宣佈提昇 Amazon EFS 的最低效率:「Amazon Elastic File System increases file system minimum throughput」。

第一段裡的幾個數字差不多就是重點了:

Amazon Elastic File System (Amazon EFS) file systems using the default bursting throughput mode now have a minimum throughput of 1 MiB/s. All EFS bursting mode file systems (regardless of size) can drive 100 MiB/s of throughput, and file systems with more than 1TiB of Standard class storage can drive 100 MiB/s per TB when burst credits are available. This change increases the minimum throughput from 50KiB/s per GiB of Standard class storage to a fixed minimum of 1 MiB/s for file systems with less than 20 GiB of Standard class storage, when burst credits are exhausted.

本來最低保證效率是每 GB 提供 50KB/sec,也就是要使用到 20GB 才會提供 1MB/sec,現在對於不到 20GB 的使用者,直接拉高到固定 1MB/sec。

這對於剛開始用的使用者會方便一些,不過 EFS 主要還是方便在不同機器上共享,效率上還是本機掛 EBS 好很多 (因為 OS 可以 cache)。

先前在 AWS 上把 /home 丟到 EFS 上面,結果因為 i/o 都需要透過網路的關係,編 pyenv 超慢,後來找一天把東西都丟回 EBS 上,速度快多了...

Lambda 可以掛 EFS 了

AWS Lambda 可以掛 Amazon EFS 了:「New – A Shared File System for Your Lambda Functions」。

這有點像是一開始 Amazon EC2 只能把資料存到 Amazon S3 上,後來支援 EBS 的感覺:這使得很多程式可以直接用內建的 library 操作檔案系統,而不需要掛 AWS 專用的 library 操作 Amazon S3。

有了一個 filesystem 後馬上就可以想到很多惡搞的方法,像是用 lambda 搞 serverless PHP 之類的,之後應該會看到很有「創意」的玩法...

Linux Kernel 5.7 釋出...

在「The New Features Of The Linux 5.7 Kernel: Tiger Lake Graphics Stable, New exFAT, Zstd F2FS, Performance」這邊有列出重點來。

其中把過熱保護機制也一起考慮進來,這樣可以避免過熱被強制降速而反而變非常慢:

Thermal pressure tracking for systems that are thermally overloaded for better task placement on CPU cores running hot.

另外一個是把 exFAT 驅動換成由 Samsung 維護的版本,照其他文章的說明,這個版本比較穩定...

The new exFAT file-system driver that replaces the exFAT driver in the staging area that had been around for a few releases. This new exFAT driver is in much better shape and actively maintained by Samsung.

主要還是過熱保護那段還蠻值得期待,不然就是要硬上水冷壓,避免遇到溫度牆...