看起來是 2019 年年初的時候 MySQL 5.1 出問題,後續決定安排升級,在 2019 年年中把系統升級到 MySQL 5.7 (Percona Server 版本):
Our first major hurdle was to get current with our version of MySQL. In July, 2019 we completed the MySQL 5.1 to MySQL 5.7 (v5.7.19-17-log Percona Server to be precise) upgrade across all MySQL instances.
Not only was support for MySQL 5.1 at End-of-Life (more than 5 years ago) but our MySQL 5.1 instances on EC2/AWS had limited storage and we were scheduled to run out of space at the end of July. Our backs were up against the wall and we had to deliver!
另外在升級到 5.7 的時候,順便把本來是 INT 的 primary key 都換成 BIGINT:
As part of the cut-over to MySQL 5.7, we also took the opportunity to bake in a number of improvements. We converted all primary key columns from INT to BIGINT to prevent hitting MAX value.
In parallel with the MySQL 5.7 upgrade we also Upgraded Django to 1.6 due a behavioral change in MySQL 5.7 related to how transactions/commits were handled for SELECT statements. This behavior change was resulting in errors with older version of Python/Django running on MySQL 5.7
Eventbrite had traditionally used pt-online-schema-change (pt-osc) to ALTER MySQL tables in production. pt-osc uses MySQL triggers to move data from the original to the “duplicate” table which is a very expensive operation and can cause replication lag. Matter of fact, it had directly resulted in several outages in H1 of 2019 due to replication lag or breakage.
Next on the list was implementing improvements to MySQL high availability and automatic failover using Orchestrator. In February of 2020 we implemented a new HAProxy layer in front of all DB clusters and we released Orchestrator to production!
Orchestrator can successfully detect the primary failure and promote a new primary. The goal was to implement Orchestrator with HAProxy first and then eventually move to Orchestrator with ProxySQL.
然後最後題到了 Square 研發的 Shift,把 gh-ost 包裝起來變成有個 web UI 可以操作: