AWS Lambda 的 cache 架構

Lobsters 上看到的老文章:「[Cache Architecture for] Container Loading in AWS Lambda」,原文從 url 看起來是去年五月發表的資訊了:「Container Loading in AWS Lambda」。

主要是在講 container 怎麼 load 才會儘快執行,首先是提到了大家常用的 layer cache,在 AWS Lambda 上則是改用了 block level cache:

Most of the existing systems do this at the layer or file level, but we chose to do it at the block level.

然後每一塊 512KB:

We unpack a snapshot (deterministically, which turns out to be tricky) into a single flat filesystem, then break that filesystem up into 512KiB chunks.

接著是提到 lazy load 的方式:「Slacker: Fast Distribution with Lazy Docker Containers」:

Our analysis shows that pulling packages accounts for 76% of container start time, but only 6.4% of that data is read.

Slacker speeds up the median container development cycle by 20x and deployment cycle by 5x.

而這個技巧也被用在 AWS Lambda 上,而且是透過 FUSE 實作:

In Lambda, we did this by taking advantage of the layer of abstraction that Firecracker provides us. Linux has a useful feature called FUSE provides an interface that allows writing filesystems in userspace (instead of kernel space, which is harder to work in).

另外一個 AWS Lambda 有實作的是 tiered caching,分成三層,包括了 worker 的 local cache (L1)、同一個 AZ 上的 cache (L2) 以及 S3 上的資料 (L3):

Despite our local on-worker (L1) cache being several orders of magnitude smaller than the AZ-level cache (L2) and that being much smaller than the full data set in S3 (L3), we still get 67% of chunks from the local cache, 32% from the AZ level, and less than 0.1% from S3.

也因為 L3 cache 是 S3 的關係,他們在 L1 與 L2 上就不用擔心 durability 的問題 (反正不見了就往後面找):

The whole set of chunks are stored in S3, meaning the cache doesn’t need to provide durability, just low latency.

但還是用了 Erasure code,儘量維持每個 cache tier 在自己 tier 裡面就可以找到資料的機率,這樣可以盡量降低 peak latency (於是造成 99.9%/99.95%/99.99% 的 SLO 不好看?):

Think about what happens in a classic consistent hashed cache with 20 nodes when a node failure happens. Five percent of the data is lost. The hit rate drops to a maximum of 95%, which is a more than 5x increase in misses given that our normal hit rate is over 99%. At large scale machines fail all the time, and we don’t want big changes in behavior when that happens.

So we use a technique called erasure coding to completely avoid the impact. In erasure coding, we break each chunk up into M parts in a way that it can be recreated from any k. As long as M - k >= 1 we can survive the failure of any node with zero hit rate impact (because the other k nodes will pick up the slack).

大概是本來比較簡單的三層架構在 benchmark 後發現無法達成對應的 SLO,所以就「補上」erasure code 拉高 SLO,從這邊就可以感覺到老闆的要求對於架構設計上的影響...

話說難得看到一些細節被丟出來...

Redis 改變授權,變成非開源軟體

Redis 宣佈拿掉開源授權:「Redis Adopts Dual Source-Available Licensing」,對應的 git commit 在「Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157)」這邊可以看到。

Starting with Redis 7.4, Redis will be dual-licensed under the Redis Source Available License (RSALv2) and Server Side Public License (SSPLv1).

算是今天蠻熱的新聞之一,不過算是在預期之內的變化,因為 Redis 在 2018 年就把很多他們自己開發的 proprietary component 變成 SSPL,現在主體也變其實不算太意外,後續就是看社群的 fork 凝聚的力量會比較大,還是 Redis 公司方的力量比較大... 尤其在 Redis 已經實作了許多 data structure 後,Redis 公司想要套現這件事情是否還有機會?

不過比較特別的反倒是微軟... 微軟早了一兩天發佈了 Redis 相容的實作 Garnet

Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication features. Garnet can work with existing Redis clients.

會是巧合嗎?這時間點其實真的很微妙...

SourceHut 被 DDoS 後的報告

SourceHut 在 DDoS 後發表了報告:「SourceHut network outage post-mortem」。

這次的攻擊在 L3 層,直接塞爆 upstream bandwidth:

At around 06:00 UTC on January 10th, a layer 3 distributed denial-of-service (DDoS) attack began to target SourceHut’s PHL infrastructure.

上游 Cogent 選擇 null route 掉:

In response to the attack, Cogent announced null routes for our downstream AS, causing our PHL network to become unreachable both for SourceHut staff and the general public.

中間有試著問 Cloudflare 以及其他的方案,但依照他們的說法,費用上無法承受:

We initially researched a number of solutions, and spoke to Cloudflare in particular due to their ability to provide a rapid response to ongoing incidents. However, given our complex requirements, Cloudflare quoted us a figure which was not attainable within our financial means as a small company. Other options we researched (though we did not seek additional quotes) had similar economical constraints.

後來的解法是在 OVH 放 proxy server (搭配 OVH 的 DDoS 保護服務),然後導到沒有公開的 subnet:

However, we found that OVH’s anti-DDoS protections were likely suitable: they are effective, and their cost is amortized across all OVH users, and therefore of marginal cost to us. To this end the network solution we deployed involved setting up an OVH box to NAT traffic through OVH’s DDoS-resistant network and direct it to our (secret) production subnet in AMS; this met our needs for end-to-end encryption as well as service over arbitrary TCP protocols.

GitHub 在還沒被 Microsoft 併購前 (2018 年) 也有被打的記錄,2015 年的時候 Google 有放一些資料,當年有寫一篇記錄下來:「Google 對 GitHub 先前遭受 GFW 的 DDoS 攻擊的分析」,不過當年這波是 L7 的。

另外 2016 年的時候 GitHub 也有整理一篇關於 TCP SYN flood 的阻擋方式,這個看起來比較接近這次的攻擊:「GitHub 對抗 TCP SYN Flood 的方式:synsanity」。

Babelfish:讓 PostgreSQL 可以吃 Microsoft SQL Server 的協定

看到「Goodbye Microsoft SQL Server, Hello Babelfish」這篇,AWSAurora (PostgreSQL) 推出了可以吃 Microsoft SQL Server 協定的 Babelfish

Today, we are making Babelfish for Aurora PostgreSQL available. Babelfish allows Amazon Aurora PostgreSQL-Compatible Edition to understand the SQL Server wire protocol.

查了一下資料發現是去年年底的時候發表的:「Want more PostgreSQL? You just might like Babelfish」,不過當時沒注意到這東西,大概是因為是 preview 的關係:

We are open sourcing Babelfish in 2021. Until then, you can use Babelfish on Amazon Aurora in a preview to see how it works and to get a sense for whether this is the right approach for you.

用起來不知道怎樣,但感覺很值得注意,目前雖然沒用到 Microsoft SQL Server 的東西,但以後遇到可以考慮看看...

除了在 AWS 上用以外,也可以自己到 GitHub 上拉 patch 回來上:「babelfish-for-postgresql」。

話說回來,PostgreSQL 被 AWS 拿來用在好多地方啊,先前大家也猜是 DocumentDB 後面是 PostgreSQL (參考「大家在猜 Amazon DocumentDB 的底層是不是 PostgreSQL...」這篇),不知道之後會不會想要跟 Oracle 的律師打架...

用 CSS 貼 3D 場景的圖

看到一個 demo 展示瀏覽器內 CSS 的處理能力,看起來已經足夠到可以處理不少貼圖與光線效果的部分了:「CSS FPS」。

This is demo of a CSS powered 3D environment. Geometry is created with HTML elements and CSS transforms. Textures and lightmaps are composed by layering multiple background-images and colour is applied using CSS blend-modes.

不過遊戲應該會需要更多種類的效果,這部份目前應該還是得靠 javascript 來產生... (如果要在瀏覽器裡面跑)

AWS Lambda 支援 Ruby 2.5 以及其他語言

AWS 宣佈 Lambda 支援 Ruby 2.5:「AWS Lambda Supports Ruby」。

另外宣佈可以透過 Layer 的方式讓任何程式語言在上面跑 (前面提到的 Ruby 看起來就是用這個方式支援的):「New for AWS Lambda – Use Any Programming Language and Share Common Components」。

The Runtime API is the future of how we’ll support new languages in Lambda. For example, this is how we built support for the Ruby language.

Cloudflare 同時支援 TLS 1.2 與 TLS 1.3 的過程

Cloudflare 算是很早就參與 TLS 1.3 發展的廠商。在參與過程中他們希望讓支援 TLS 1.3 draft 的瀏覽器可以開始使用 TLS 1.3 draft,但又不希望因為 draft 頻繁修改而導致本來的使用者受到影響,所以就找了方法讓兩者並存:「Know your SCM_RIGHTS」。

這個方法就是 SCM_RIGHTS,可以讓另外一個 process 存取自己的 file description。

You can use UNIX-domain sockets to pass file descriptors between applications, and like everything else in UNIX connections are files.

所以他們的作法就是先讀取 TLS 裡 Client Hello 的資料,如果裡面有看到想要使用 TLS 1.3 的訊息,就透過前面提到的 SCM_RIGHTS 丟進 Golang 寫的程式跑:

We let OpenSSL read the “Client Hello” message from an established TCP connection. If the “Client Hello” indicated TLS version 1.3, we would use SCM_RIGHTS to send it to the Go process. The Go process would in turn try to parse the rest of the “Client Hello”, if it were successful it would proceed with TLS 1.3 connection, and upon failure it would give the file descriptor back to OpenSSL, to handle regularly.

這樣本來的 stack 就只要修改一小段程式碼,將當時還很頻繁修改的 TLS 1.3 draft 丟到另外一個 process 跑,就比較不用擔心本來的 stack 會有狀況了。

用 Dive 看 Docker Image 裡面每一層的內容

看到 wagoodman/dive 這個工具,只要給 Docker 的 image 與 tag 資訊,就可以看每一層裡面的內容:

安裝上也有提供各平台的套件包 (deb、rpm 之類的)。操作的熱鍵在首頁的下方也有提供,用起來很方便。

Mozilla 跟 Google 都宣佈了 TLS 1.0 與 TLS 1.1 的退役計畫

UpdateApple 也宣佈了,時間點跟大家都差不多:「Deprecation of Legacy TLS 1.0 and 1.1 Versions」。

Mozilla 宣佈了「Removing Old Versions of TLS」,而 Google 也宣佈了「Modernizing Transport Security」,兩篇都是講自家瀏覽器 TLS 1.0 與 TLS 1.1 的退役時程。

Mozilla 這邊的計畫是 2020 年三月移除:

In March of 2020, Firefox will disable support for TLS 1.0 and TLS 1.1.

Google 這邊的計畫則是 Chrome 81 移除,換算成時間會從 2020 年一月開始影響到 canary channel,到 release channel 應該跟 Firefox 差不多時間:

In line with these industry standards, Google Chrome will deprecate TLS 1.0 and TLS 1.1 in Chrome 72. Sites using these versions will begin to see deprecation warnings in the DevTools console in that release. TLS 1.0 and 1.1 will be disabled altogether in Chrome 81. This will affect users on early release channels starting January 2020.

差不多試從現在開始的一年半。

雖然這是講瀏覽器端的支援,但如果伺服器想要只支援 TLS 1.2+ 的話,就得考慮一下舊 client 支援的情況了。

桌機影響會比較小 (升級比較方便,替代方案也比較多),而行動平台看起來需要 Android 4.4+、iOS 7+,就要看各網站或是服務的族群了...

Transport-Layer Encryption 與 End-to-End Encryption 的差異

EFF 的「Transport-Layer Encryption vs End-to-End Encryption - GIF」這篇文章介紹了 Transport-Layer Encryption 與 End-to-End Encryption 的差異,最後還給了一張 GIF 說明:

其實 GIF 給的範例還蠻清楚的,在 Transport-Layer Encryption 中服務提供商可以看到原始內容 (以 GIF 內提到的例子就是 Google),而在 End-to-End Encryption 中就不行,只有傳輸雙方可以知道原始內容。

然後文章裡也提到了 Tor Messenger,可以吃現有的通訊軟體,然後在上面疊出 End-to-End Encryption。