GitHub 的 MySQL 5.7 升級到 8.0 的細節

GitHub Blog 上面寫了一篇關於 GitHub 怎麼把 MySQL 5.7 升級到 8.0 的過程,有點長度但是裡面有蠻多數字與架構可以看:「Upgrading GitHub.com to MySQL 8.0」。

開頭先順便提一下,看這篇後可以交叉看 GitHub 的 Incident History,有幾次跟 database 有關的事件,雖然不能直接確認與這波升級有關,但心裡可以有個底...

數字與時間的部分主要是這些:

Our fleet consists of 1200+ hosts. It’s a combination of Azure Virtual Machines and bare metal hosts in our data center.

We store 300+ TB of data and serve 5.5 million queries per second across 50+ database clusters.

Preparation for the upgrade started in July 2022 and we had several milestones to reach even before upgrading a single production database.

另外雖然沒有明講,但從文章中其他段落的描述,以及相關的圖片,可以看出來 GitHub 是使用 single-primary (single-master) 的架構,這邊沒有用到 multi-primary (multi-master) 類的架構:

We opted not to do direct upgrades on the primary database host. Instead, we would promote a MySQL 8.0 replica to primary through a graceful failover performed with Orchestrator.

後續升級的部分有點長,第一波關於 read-only replica 的部分雖然有些地方沒講清楚,但基本上大家的作法都大同小異:

比較明顯有疑問的是,第一步為什麼不是直接生一台新的 8.0 觀察 (這樣觀察到的環境才會與後續過程接近),而是 in-place upgrade,而後續開的機器又是 provision。不過這個算是小問題...

比較值得研究的是在第二步與第三步的說明裡面提到的 primary (master) 這塊。

第二步是先改變 topology,這個架構算是蠻特別的的過渡架構,只會維持幾個小時;會把其中一台 8.0 replica 拉起來放在中間,然後再串一台 5.7 replica,接下去再串 5.7/8.0 的 read-only replicas:

第三步把 primary (master) 指到 8.0 上:

這個特別的架構可以推敲出來是想要能夠快速在有狀況時完全 rollback 回 5.7,不過可以馬上想到 8.0 的資料丟到 5.7 上的問題。

MySQL 的慣例是下一個版本的 replication 通常都會通 (像是 5.0 -> 5.1,或是 5.1 -> 5.5,而這邊的例子是 5.7 -> 8.0),這在官方的文件「Replication Compatibility Between MySQL Versions」有提過。

但反過來就不一定了,這也是看到圖時馬上會想到的問題,在文章裡面也有提到:

MySQL supports replication from one release to the next higher release but does not explicitly support the reverse (MySQL Replication compatibility).

所以他們只能在 staging 上演練看看,找出會炸掉的東西,然後得提前先修改完:

When we tested promoting an 8.0 host to primary on our staging cluster, we saw replication break on all 5.7 replicas.

另外一方面,在文章開頭的地方也有提到利用 CI 事先找出問題:

We added MySQL 8.0 to Continuous Integration (CI) for all applications using MySQL. We ran MySQL 5.7 and 8.0 side-by-side in CI to ensure that there wouldn’t be regressions during the prolonged upgrade process. We detected a variety of bugs and incompatibilities in CI, helping us remove any unsupported configurations or features and escape any new reserved keywords.

用這些方法儘量把問題圍堵找出來,而真的遇到在 production 上的問題時,應該是看情況來決定要不要 rollback 回 5.7 整包重來?

就... 看看當作一個有趣的 case study。

reddit 與 4chan 在新聞網路上的獨特性

在「Study finds fringe communities on Reddit and 4chan have high influence on flow of alternative news to Twitter」這邊看到的:

After analyzing millions of posts containing mainstream and alternative news shared on Twitter, Reddit and 4chan, Jeremy Blackburn, Ph.D., and collaborators found that alt-right communities within 4chan, an image-based discussion forum where users are anonymous, and Reddit, a social news aggregator where users vote up or down on posts, have a surprisingly large influence on Twitter.

依照對 reddit4chan 的描述,這兩個媒體對 Twitter 的影響,會讓我聯想到在台灣 Ptt 對各新聞媒體的影響:Ptt 是很多新聞的起點?

"Based on our findings, these smaller, fringe communities on Reddit and 4chan serve as an incubation chamber for a lot of information," said Blackburn, assistant professor of computer science in the UAB College of Arts and Sciences. "The content and talking points are refined until they finally break free and make it to larger, more mainstream communities."

真的研究應該可以看出 Ptt 的影響力?

超過一半的心理研究都無法重製結果

應該是前陣子被 Nuzzel 推薦 2015 八月的老新聞,這個研究在 Nature 上發表:「Over half of psychology studies fail reproducibility test」。

這個研究重複了 98 篇原創研究來重新實驗:

In the biggest project of its kind, Brian Nosek, a social psychologist and head of the Center for Open Science in Charlottesville, Virginia, and 269 co-authors repeated work reported in 98 original papers from three psychology journals, to see if they independently came up with the same results.

結果在 100 次的實驗 (其中有 2 個是重複的) 發現只有 39 個實驗結果被重製出來:

According to the replicators' qualitative assessments, as previously reported by Nature, only 39 of the 100 replication attempts were successful. (There were 100 completed replication attempts on the 98 papers, as in two cases replication efforts were duplicated by separate teams.)

這成功率其實頗低,如果實驗無法被重製,就不能算科學研究...