看到「Building and deploying MySQL Raft at Meta」這篇,在講 Meta (Facebook) 把 MySQL 的 replication 架構換成自己用 Raft 的系統。
舊的系統是走 MySQL 的 semisync replication:
Previously, our replication solution used the MySQL semisynchronous (semisync) replication protocol.
其中 semisync replication 是在 MySQL 5.5 加入的功能,在至少一個遠端收到 replication log 後才傳回成功 (可以設定數量):「Semisynchronous Replication」。
Semisynchronous replication falls between asynchronous and fully synchronous replication. The source waits until at least one replica has received and logged the events (the required number of replicas is configurable), and then commits the transaction.
然後舊的系統是透過一包 Python 軟體在管理這些機器的各種 failover 操作:
The control plane operations (e.g., promotions, failover, and membership change) would be the responsibility of a set of Python daemons (henceforth called automation).
這個方法常遇到的問題是切換 primary server (以前叫做 master server) 時有可能會因為 binlog position 接不起來而失敗。
所以後來 MySQL 導入了 GTID,可以緩解這個問題,但還是有可能會發生不同的 secondary server (以前叫做 slave server) 會有不一樣的資料。
而在 Meta 改出來的架構裡面,把 replication data 直接寫到一個用 Raft 同步的系統,同步到其他的 secondary server 上面:
In MySQL Raft:
- Primary writes to binlog via Raft, and Raft sends binlog to followers/replicas.
- Replicas/followers receive in binlog and apply the transactions to the engine. An apply log is created during apply.
- Binlog is the replicated log from the Raft point of view.
是個一般單位不太會遇到的架構,而且可以預期其他公司的人遇到類似問題應該也不會用這個方法解...