GitHub 的 MySQL 架構與數字

前幾天 GitHub 有寫一篇文章提到他們的 MySQL 是怎麼 scale 的,另外裡面也有一些數字可以看:「Partitioning GitHub’s relational databases to handle scale」。

他們最主要的 database cluster 叫做 mysql1,裡面有提到 2019 年的時候這個 cluster 是 950K qps,其中 primary 有 50K qps:

In 2019, mysql1 answered 950,000 queries/s on average, 900,000 queries/s on replicas, and 50,000 queries/s on the primary.

在 2021 年的時候變成 1.125M qps,其中 75K qps 在 primary 上:

Today, in 2021, the same database tables are spread across several clusters. In two years, they saw continued growth, accelerating year-over-year. All hosts of these clusters combined answer 1,200,000 queries/s on average (1,125,000 queries/s on replicas, 75,000 queries/s on the primaries). At the same time, the average load on each host halved.

另外這幾年比較成熟的方案都拿出來用了,包括用 ProxySQL 降低連線數的壓力 (connection pool 的概念):

[W]e started using ProxySQL to reduce the number of connections opened against our primary MySQL instances.

ProxySQL is used for multiplexing client connections to MySQL primaries.

另外用 Vitess 協助 sharding 之間的轉移:

Vitess is a scaling layer on top of MySQL that helps with sharding needs. We use its vertical sharding feature to move sets of tables together in production without downtime.

這兩套應該是已經蠻成熟的了... 另外也可以發現老方法還是很好用,就算在 GitHub 這種量還是可以暴力解決很多事情。

Leave a Reply

Your email address will not be published. Required fields are marked *