GitHub 的 MySQL 5.7 升級到 8.0 的細節

GitHub Blog 上面寫了一篇關於 GitHub 怎麼把 MySQL 5.7 升級到 8.0 的過程,有點長度但是裡面有蠻多數字與架構可以看:「Upgrading GitHub.com to MySQL 8.0」。

開頭先順便提一下,看這篇後可以交叉看 GitHub 的 Incident History,有幾次跟 database 有關的事件,雖然不能直接確認與這波升級有關,但心裡可以有個底...

數字與時間的部分主要是這些:

Our fleet consists of 1200+ hosts. It’s a combination of Azure Virtual Machines and bare metal hosts in our data center.

We store 300+ TB of data and serve 5.5 million queries per second across 50+ database clusters.

Preparation for the upgrade started in July 2022 and we had several milestones to reach even before upgrading a single production database.

另外雖然沒有明講,但從文章中其他段落的描述,以及相關的圖片,可以看出來 GitHub 是使用 single-primary (single-master) 的架構,這邊沒有用到 multi-primary (multi-master) 類的架構:

We opted not to do direct upgrades on the primary database host. Instead, we would promote a MySQL 8.0 replica to primary through a graceful failover performed with Orchestrator.

後續升級的部分有點長,第一波關於 read-only replica 的部分雖然有些地方沒講清楚,但基本上大家的作法都大同小異:

比較明顯有疑問的是,第一步為什麼不是直接生一台新的 8.0 觀察 (這樣觀察到的環境才會與後續過程接近),而是 in-place upgrade,而後續開的機器又是 provision。不過這個算是小問題...

比較值得研究的是在第二步與第三步的說明裡面提到的 primary (master) 這塊。

第二步是先改變 topology,這個架構算是蠻特別的的過渡架構,只會維持幾個小時;會把其中一台 8.0 replica 拉起來放在中間,然後再串一台 5.7 replica,接下去再串 5.7/8.0 的 read-only replicas:

第三步把 primary (master) 指到 8.0 上:

這個特別的架構可以推敲出來是想要能夠快速在有狀況時完全 rollback 回 5.7,不過可以馬上想到 8.0 的資料丟到 5.7 上的問題。

MySQL 的慣例是下一個版本的 replication 通常都會通 (像是 5.0 -> 5.1,或是 5.1 -> 5.5,而這邊的例子是 5.7 -> 8.0),這在官方的文件「Replication Compatibility Between MySQL Versions」有提過。

但反過來就不一定了,這也是看到圖時馬上會想到的問題,在文章裡面也有提到:

MySQL supports replication from one release to the next higher release but does not explicitly support the reverse (MySQL Replication compatibility).

所以他們只能在 staging 上演練看看,找出會炸掉的東西,然後得提前先修改完:

When we tested promoting an 8.0 host to primary on our staging cluster, we saw replication break on all 5.7 replicas.

另外一方面,在文章開頭的地方也有提到利用 CI 事先找出問題:

We added MySQL 8.0 to Continuous Integration (CI) for all applications using MySQL. We ran MySQL 5.7 and 8.0 side-by-side in CI to ensure that there wouldn’t be regressions during the prolonged upgrade process. We detected a variety of bugs and incompatibilities in CI, helping us remove any unsupported configurations or features and escape any new reserved keywords.

用這些方法儘量把問題圍堵找出來,而真的遇到在 production 上的問題時,應該是看情況來決定要不要 rollback 回 5.7 整包重來?

就... 看看當作一個有趣的 case study。

Facebook 把自家的 MySQL 升級到 8.0

Facebook Engineering 發了一篇將 MySQL 升級到 8.0 的說明:「Migrating Facebook to MySQL 8.0」。

先前的版本主要是 5.6,加上 MyRocks

Our last major version upgrade, to MySQL 5.6, took more than a year to roll out. When version 5.7 was released, we were still in the midst of developing our LSM-Tree storage engine, MyRocks, on version 5.6. Since upgrading to 5.7 while simultaneously building a new storage engine would have significantly slowed the progress on MyRocks, we opted to stay with 5.6 until MyRocks was complete. MySQL 8.0 was announced as we were finishing the rollout of MyRocks to our user database (UDB) service tier.

GitHub 上是有 facebook/mysql-8.0,但看起來從 2017 後就沒更新了,所以應該是沒有 open source 出來。

看看就好 XD

Percona 推出 MySQL 8.0 的對應版本了...

Percona 在過節前推出 Percona Server for MySQL 8.0 了:「Announcing General Availability of Percona Server for MySQL 8.0」。

Oracle 版的 MySQL 8.0 多了不少功能,像是往 NoSQL 靠的功能與 GIS 相關的功能,但讓我注意到的是 Percona 自己的事情... TokuDB 因為 MyRocks 而要被放生了:

Features Being Deprecated in Percona Server for MySQL 8.0
TokuDB Storage Engine: TokuDB will be supported throughout the Percona Server for MySQL 8.0 release series, but will not be available in the next major release. Percona encourages TokuDB users to explore the MyRocks Storage Engine which provides similar benefits for the majority of workloads and has better optimized support for modern hardware.