SourceHut 被 DDoS 後的報告

SourceHut 在 DDoS 後發表了報告:「SourceHut network outage post-mortem」。

這次的攻擊在 L3 層,直接塞爆 upstream bandwidth:

At around 06:00 UTC on January 10th, a layer 3 distributed denial-of-service (DDoS) attack began to target SourceHut’s PHL infrastructure.

上游 Cogent 選擇 null route 掉:

In response to the attack, Cogent announced null routes for our downstream AS, causing our PHL network to become unreachable both for SourceHut staff and the general public.

中間有試著問 Cloudflare 以及其他的方案,但依照他們的說法,費用上無法承受:

We initially researched a number of solutions, and spoke to Cloudflare in particular due to their ability to provide a rapid response to ongoing incidents. However, given our complex requirements, Cloudflare quoted us a figure which was not attainable within our financial means as a small company. Other options we researched (though we did not seek additional quotes) had similar economical constraints.

後來的解法是在 OVH 放 proxy server (搭配 OVH 的 DDoS 保護服務),然後導到沒有公開的 subnet:

However, we found that OVH’s anti-DDoS protections were likely suitable: they are effective, and their cost is amortized across all OVH users, and therefore of marginal cost to us. To this end the network solution we deployed involved setting up an OVH box to NAT traffic through OVH’s DDoS-resistant network and direct it to our (secret) production subnet in AMS; this met our needs for end-to-end encryption as well as service over arbitrary TCP protocols.

GitHub 在還沒被 Microsoft 併購前 (2018 年) 也有被打的記錄,2015 年的時候 Google 有放一些資料,當年有寫一篇記錄下來:「Google 對 GitHub 先前遭受 GFW 的 DDoS 攻擊的分析」,不過當年這波是 L7 的。

另外 2016 年的時候 GitHub 也有整理一篇關於 TCP SYN flood 的阻擋方式,這個看起來比較接近這次的攻擊:「GitHub 對抗 TCP SYN Flood 的方式:synsanity」。

ClickHouse 弄了一個 C++ 寫的 ZooKeeper drop-in replacement:ClickHouse Keeper

Hacker News 上看到「ClickHouse Keeper: A ZooKeeper alternative written in C++ (clickhouse.com)」,原文是「ClickHouse Keeper: A ZooKeeper alternative written in C++」。

在 distributed coordination 這個領域目前應該是 etcd 比較知名,但 Apache ZooKeeper 畢竟算是元老,還是有不少應用程式會把 ZooKeeper 當作基礎建設在用。

因此 etcd 也包了一個 zetcd,可以用 etcd 當作底層結構,但提供 ZooKeeper API 的介面讓應用程式使用:

Serve the Apache Zookeeper API but back it with an etcd cluster

這次 ClickHouse 的人搞出一個用 C++ 寫的 ClickHouse Keeper,定位是 drop-in replacement,想要解決現有的應用程式遇到的記憶體資源問題。

從開頭的說明就可以看到著重在這塊:

up to 46 times less memory than ZooKeeper ​​for the same volume of data while maintaining performance close to ZooKeeper.

而對於新的應用程式,在開發時應該就不太會選 ZooKeeper 了,畢竟連 distributed lock 都得自己操作 znode 把功能疊出來 (像是官網很貼心的還提供了「ZooKeeper Recipes and Solutions」,裡面提供了 lock 的方法),而這種事情太累,用 etcd 方便太多...

AWS Aurora Xanadu?

在「Why PostgreSQL High Availability Matters and How to Achieve It (yugabyte.com)」裡面看到 AWS 也在研發類似 GCP 提供的 Spanner 的服務,計畫名稱叫做 Aurora Xanadu:「36328981」。

franckpachot

Google has Spanner. AWS is working on something similar (project Aurora Xanadu). And both have YugabyteDB in their marketplace. Those are Distributed SQL (Global ACID), not Citus. For DataWarehouse which doesn't need ACID, there are other services.

也先把這個連結備份起來,看看後面是不是直接拿這個名字來用?

分散式系統的 clock

前幾天在 Hacker News 上看到「Clocks and Causality – Ordering Events in Distributed Systems (2022) (exhypothesi.com)」這篇,講分散式系統上 clock 的設計,作者也有跑出來在 Hacker News 上面跟大家聊一下 (帳號是 thoughtlede),原文在「Clocks and Causality - Ordering Events in Distributed Systems」這邊。

文章裡面主要講空間是 O(1)Lamport timestamp 與空間是 O(n)Vector clock (這邊的 n 相對於節點數量),以及這兩個對應的擴充版本:

作者會整理這些資料的原因是因為在研究 CRDT 的時候看到演算法中常常會需要處理分散式系統裡面事件的順序,所以花了一些時間整理常見的方式:

Author here. Pleasantly surprised to see the article here.

Some context behind the article. I studied CRDTs for a few months, and noticed that different CRDT designs use logical clocks in different and clever ways. And I haven't seen anyone narrate all those ways of use in one article. My attempt with this article was to dredge up those flavors of logical clocks into one article and give them names for future reference.

(To respond to a couple of other comments, I ignored atomic (and gps-based) clocks in this discussion, as indicated in my footnote 3).

我記得還有一個 Interval Tree Clocks 可以參考 (在「Interval Tree Clocks」這邊講的比較清楚),是針對節點的動態增刪而改善的演算法,但不確定有什麼比較有名的系統有用。

大多數應該都是用 Vector clock,畢竟是在 2007 年的「Dynamo: Amazon’s Highly Available Key-value Store」被發揚光大,而且也算是還不錯的演算法?

KataGo 的分散式訓練計畫啟動了

KataGo 應該是目前 open source 領域裡面數一數二強的圍棋引擎,在去年就一直在開發可以讓大家參與的分散式訓練計畫,最近釋出了 v1.8.0 版,算是公開啟動了:「KataGo Distributed Training」,作者在「KataGo distributed training is open!」這邊也有大概寫一下。

基本上照著官方網站上面的說明做就可以了,可以下載 precompiled binary 或是自己編,自己編的時候注意不能直接拿 master branch 裡面編 (client hash 會不對),我自己目前是用 v1.8.0 這個版本編出來跑。

Reddit 上面的「KataGo's new run is open for public contributions!」也可以看到說明的圖片 (要注意圖上的 X 軸不是線性),算是接著本來的 g170 訓練下去,另外也標示了 ELFv2Leela Zero 大致上的強度:

目前看起來陸陸續續有人開始參與了...

另外在 CGOS 上面也可以看到 kata1 開頭的 bot 在跑,而且看起來會一直把新的 training 成果更新上去跑。

分散式系統的介紹

Hacker News Daily 上看到講分散式系統的整理:「An introduction to distributed systems」,作者 (我記得他應該是在 Stripe?) 之前因為需要研究資料庫的正確性而搞了不少事情,還開發了對應的測試框架。

這篇文章把相關的理論都列出來,變成一篇滿滿都是關鍵字的文章。這邊提到的 12 到 16 小時應該是對於已經有基礎的人上課的時間,如果平常沒有在碰的人,以及需要自己學習的人應該會長很多:

This outline accompanies a 12-16 hour overview class on distributed systems fundamentals.

短時間應該看不完,但可以每個週末看一個主題研究一下... 然後也不用想太多,很多時候用單機硬幹 (scale up) 可以爭取到時間回來讀這些東西 & 設計系統轉換。

TiDB 單機效能

TiDB 是一個支援分散式運算的資料庫,希望能夠完整地模擬 MySQL Protocol,而 Percona 試著測試 TiDB 在單機的效能,雖然測試的項目很簡單,但結果頗有趣的:「A Quick Look into TiDB Performance on a Single Server」。

Percona 觀察到的現象是 TiDB 對於單一 SQL query 支援多 CPU 運算 (MySQL 只會使用單 CPU),所以在高階的機器上,某些 SQL query 會快很多。而 OLAP 類型的 SQL query 也不錯,但常見的 OLTP 應用則慢不少:

Short version: TiDB supports parallel query execution for selects and can utilize many more CPU cores – MySQL is limited to a single CPU core for a single select query. For the higher-end hardware – ec2 instances in my case – TiDB can be 3-4 times faster for complex select queries (OLAP workload) which do not use, or benefit from, indexes. At the same time point selects and writes, especially inserts, can be 5x-10x slower. Again, please note that this test was on a single server, with a single TiKV process.

是個有趣的 drop-in...

Cloudflare 推出 IPFS Gateway

Cloudflare 推出了自己的 IPFS Gateway:「Cloudflare goes InterPlanetary - Introducing Cloudflare’s IPFS Gateway」。

IPFS 偏靜態性應用 (雖然官方一直很堅持說動態的資訊也可以在上面跑),這就很適合 CDN 架構拿出來用... Cloudflare 出手應該可以讓 IPFS 的門檻再降低一些。

相較於官方提供的 ipfs.io,Cloudflare 的應該會快不少... (畢竟是直接連到最近的機房)

MySQL 的 XA Transaction

Percona 的「How to Deal with XA Transactions Recovery」這篇提到 MySQLXA Transaction 的復原問題。

XA Transaction 主要拿來做分散式上的事物交易,在官方文件上就有提到對應的標準:

The MySQL XA implementation is based on the X/Open CAE document Distributed Transaction Processing: The XA Specification. This document is published by The Open Group and available at http://www.opengroup.org/public/pubs/catalog/c193.htm.

不過我覺得比較有趣的是這點,Percona 直接建議不要用 5.6 的 XA:

But there is a limitation in 5.6: you can only XA commit/rollback transactions that belong to your session. That means after a crash you are out of luck. To get rid of these you need to promote a slave or perform a logical dump and restore. The best plan is to avoid the use of XA transactions with 5.6.

不過翻了 bug report,Oracle 官方的 5.7 還是沒解決,看起來是 Percona 自己 patch 掉 XDDD

Percona XtraDB Cluster 解釋最近的版本為什麼可以再度提昇效能

Percona XtraDB Cluster 最近的一個版本宣稱效能大幅提昇,這篇試著解釋原因:「How We Made Percona XtraDB Cluster Scale」。

裡面提了三個情境,解釋了這些效能問題以及解法,進而帶出這次的效能提昇:

With these three main optimizations, and some small tweaks, we have tuned Percona XtraDB Cluster to scale better and made it fast enough for the growing demands of your applications. All of this is available with the recently released Percona XtraDB Cluster 5.7.17-29.20. Give it a try and watch your application scale in a multi-master environment, making Percona XtraDB Cluster the best fit for your HA workloads.