Amazon EFS 效能提昇的一些討論

上一篇「Amazon EFS 的效能提昇」提到 Amazon EFS 的效能提昇,在 Hacker News 上看到 Amazon EFS 團隊的 PMT (Product-Manager-Technical) 出來回一些東西:「Amazon Elastic File System Update – Sub-Millisecond Read Latency (amazon.com)」,搜尋 geertj 應該就可以看到他回的東西了...

像是即使是 Jeff Barr 發表這篇文章,也還是經過 legal team 的同意才能發表:

(PMT on the EFS team).

Yes, the wordings are carefully formulated as they have to be signed off by the AWS legal team for obvious reasons. With that said, this update was driven by profiling real applications and addressing the most common operations, so the benefits are real. For example, a simple WordPress "hello world" is now about 2x as fast as before.

另外這次的效能提昇是透過 cache 層達成的:

I'm the PMT for this project in the EFS team. The "flip the switch" part was indeed one of the harder parts to get right. Happy to share some limited details. The performance improvement builds on a distributed consistent cache. You can enable such a cache in multiple steps. First you deploy the software across the entire stack that supports the caching protocol but it's disabled by configuration. Then you turn it for the multiple components that are involved in the right order. Another thing that was hard to get right was to ensure that there are no performance regressions due to the consistency protocol.

然後在每個 AZ 都有 cache:

The caches are local to each AZ so you get the low latency in each AZ, the other details are different. Unfortunately I can't share additional details at this moment, but we are looking to do a technical update on EFS at some point soon, maybe at a similar venue!

另外看起來主要就是 metadata cache 的幫助:

NFS workloads are typically metadata heavy and highly correlated in time, so you can achieve very high hit rates. I can't share any specific numbers unfortunately.

還是有很多細節數字不能透漏,但知道是透過 cache 達成的就已經可以大致上想像後面是怎麼弄出來的了...