PostgreSQL 上對應 pt-online-schema-change 的工具 pg-osc

翻資料的時候翻到「pg-osc: Zero downtime schema changes in PostgreSQL」這篇文章,可以在 PostgreSQL 上做到類似 pt-online-schema-change 的事情,這邊先提一下 pt-online-schema-change 的說明:

ALTER tables without locking them.

不管是 MySQL 還是 PostgreSQL,都會遇到 ALTER TABLE 常常會 lock 的問題,這點主要的影響就是 db migration。

在 dev 環境的機器應該沒什麼問題,資料量都不大,應該是很快就可以跑完;但在 stage 環境時就會開始有狀況了 (假設是從 production 複製過來的資料,表格的大小可能偏大),但應該還是可以用 downtime 換,慢慢跑,花幾個小時把 db migration 跑完。

可是到了 production 環境時就不太能這樣搞了,這也是一般不太建議在 production 環境裡用現成的 db migration 工具,尤其當資料量偏大的時候。

解這個問題的方法就是透過繞路的方式,不要直接動原來的 table:基本的想法是開一個新的 table,然後一直從舊的 table 搬資料到新的 table 上 (包括應用程式下指令寫到舊的 table 上的資料),直到最後用一個短暫的 lock 機制來切換 table。

在 MySQL 的世界裡比較有名的是 Percona 的 pt-online-schema-change (trigger-based) 以及 GitHubgh-ost (replication-based),另外找資料的時候有發現 Facebook 也有丟 OnlineSchemaChange (trigger-based) 出來。

在 PostgreSQL 的世界裡似乎是 pg_repack 這個方案,用了 trigger-based 的方式處理,但之前沒有注意到,是翻 pg-osc 的時候被提到才知道有這個工具。

而這次提到的 pg-osc 則是 2022 年才出的軟體,也是 trigger-based 的方式:

pg-osc uses the concept of shadow tables to perform schema changes. At a high level, it creates a shadow table that looks structurally the same as the primary table, performs the schema change on the shadow table (avoiding any locks since nothing is using this table), copies contents from the primary table to the shadow table and swaps the table names in the end while preserving all changes to the primary table using triggers (via audit table).

另外從 PostgreSQL 的 wiki 上看到「Change management tools and techniques」這頁,裡面看到「Metagration: Logical PostgreSQL Migration」這個工具,看起來好像是 replication-based 的方案,不過還是有用到一些 trigger 做事。

這些方案都先記錄起來好了...

AWS MGN 推出將 CentOS 轉成 Rocky Linux 的服務

搜尋了一下以前的文章,這好像是第一次在 AWS News Blog 上第一次提到 Rocky Linux:「AWS MGN Update – Configure DR, Convert CentOS Linux to Rocky Linux, and Convert SUSE Linux Subscription」。

就如同標題所說的,AWS MGN (AWS Application Migration Service) 推出了將 CentOS 的機器轉成 Rocky Linux 的服務:

翻了一下,Rocky Linux 官方也有提供類似的東西,不過是 command line 的形式:「How to Migrate to Rocky Linux from CentOS Stream, CentOS, Alma Linux, RHEL, or Oracle Linux」,這次 AWS 這樣包起來是方便一些沒錯,可以看到是搭配 SSM Agent 實做出來的:

比較意外的是這次是 Jeff Barr自己出來公佈,通常這種功能都是其他人寫...

低 Downtime 將 4TB 的 PostgreSQL 9.6 資料庫升級到 13 的故事

前幾天在 Hacker News 首頁上看到的文章,講怎麼把一個 4TB 的 PostgreSQL 從 9.6 升級到 13 的故事:「How we upgraded our 4TB Postgres database (retool.com)」,原文在「How Retool upgraded our 4 TB main application PostgreSQL database」,翻了一下 LinkedIn,這篇文章的作者 Peter Johnston 在 Retool 掛的是 Security Software Engineer,另外他也有在 Hacker News 上的討論出現 (帳號是 mrbabbage),可以搜尋翻翻看他的回覆。

看完文章後發現方法的概念其實不難,主要是要找到對的工具來用。基本的想法是先生出一個 initial dump,然後架構 logical replication,接下來就是處理各種因為在 4TB data 這個 scale 下會遇到的問題。

主要用到的工具是 Citus Data 的 Warp:「Citus warp: Database migrations without the pain」,不過這個工具的限制是表格必須都有 single column primary key,所以他們為了這次轉移也有小改 database schema 配合 Warp 的要求:

We had to do a bit of finagling to coax Warp into processing our database. Warp expects all tables to have a single column primary key, so we had to convert compound primary keys into unique constraints and add scalar primary keys. Otherwise, Warp was very straightforward to use.

另外針對比較大的兩個 append-only 的表格 (分別是 2TB 與 x00 GB) 做處理,在 initial dump 的階段不對這兩個表格做 replication,而是透過自製的 Python script 搬移:

To handle the two massive tables we skipped in Warp, we wrote a Python script to bulk transfer data from the old database server to the new.

然後是 foreign key 相關的關閉與重啟,這算是 RDBMS 在大量資料的 dump & restore 的標準作業了:

As you can see from the runbook above, one of the steps we had to do was to turn off and then re-enable foreign key constraint checks.

最後執行下去,整個 downtime 只有十五分鐘:

We scheduled the maintenance window late on Saturday, October 23rd, at the lowest period of Retool cloud traffic. With the configuration described above, we were able to bring up a new database server at version 13 in around 15 minutes, subscribed to changes at our 9.6 primary with logical decoding.

另外也有提到有計畫要 sharding,之後 main database 就有機會被拆小:

We anticipate we’ll have sharded our database by the end of that support window, and be performing our next substantial version upgrades incrementally.

整個計畫的核心概念不難,主要是要怎麼順出來並且執行...

跨雲端的 Zero Downtime 轉移

看到「Ask HN: Have you ever switched cloud?」這個討論,在講雲端之間的搬遷,其中 vidarh 的回答可以翻一下...

首先是他提到原因的部份,基本上都是因為錢的關係,從雲搬到另外一個雲,然後再搬到 Dedicated Hosting 上:

Yes. I once did zero downtime migration first from AWS to Google, then from Google to Hetzner for a client. Mostly for cost reasons: they had a lot of free credits, and moved to Hetzner when they ran out.

Their savings from using the credits were at least 20x what the migrations cost.

然後他也直接把整理的資料丟出來,首先是在兩端上都先建立 load balancer 類的服務:

* Set up haproxy, nginx or similar as reverse proxy and carefully decide if you can handle retries on failed queries. If you want true zero-downtime migration there's a challenge here in making sure you have a setup that lets you add and remove backends transparently. There are many ways of doing this of various complexity. I've tended to favour using dynamic dns updates for this; in this specific instance we used Hashicorp's Consul to keep dns updated w/services. I've also used ngx_mruby for instances where I needed more complex backend selection (allows writing Ruby code to execute within nginx)

再來是打通內網,其實就是 site-to-site VPN:

* Set up a VPN (or more depending on your networking setup) between the locations so that the reverse proxy can reach backends in both/all locations, and so that the backends can reach databases both places.

然後建立資料庫的 replication server 以及相關的機制:

* Replicate the database to the new location.

* Ensure your app has a mechanism for determining which database to use as the master. Just as for the reverse proxy we used Consul to select. All backends would switch on promoting a replica to master.

* Ensure you have a fast method to promote a database replica to a master. You don't want to be in a situation of having to fiddle with this. We had fully automated scripts to do the failover.

然後是確認 application 端可以切換自如:

* Ensure your app gracefully handles database failure of whatever it thinks the current master is. This is the trickiest bit in some cases, as you either need to make sure updates are idempotent, or you need to make sure updates during the switchover either reliably fail or reliably succeed. In the case I mentioned we were able to safely retry requests, but in many cases it'll be safer to just punt on true zero downtime migration assuming your setup can handle promotion of the new master fast enough (in our case the promotion of the new Postgres master took literally a couple of seconds, during which any failing updates would just translate to some page loads being slow as they retried, but if we hadn't been able to retry it'd have meant a few seconds downtime).

然後確認新的雲端有足夠的 capacity 撐住流量後,就是要轉移了,首先是降低 DNS TTL:

Once you have the new environment running and capable of handling requests (but using the database in the old environment):

* Reduce DNS record TTL.

然後把舊的 load balancer 指到新的後端,這時候如果發現問題可以快速 rollback 回來:

* Ensure the new backends are added to the reverse proxy. You should start seeing requests flow through the new backends and can verify error rates aren't increasing. This should be quick to undo if you see errors.

接著把 DNS 指到新的 load balancer,理論上應該不會有太大問題:

* Update DNS to add the new environment reverse proxy. You should start seeing requests hit the new reverse proxy, and some of it should flow through the new backends. Wait to see if any issues.

接著把資料庫切到新的機房,有問題時可以趕快切回去再確認哪邊有狀況:

* Promote the replica in the new location to master and verify everything still works. Ensure whatever replication you need from the new master works. You should now see all database requests hitting the new master.

最後的階段就是拔掉舊的架構:

* Drain connections from the old backends (remove them from the pool, but leave them running until they're not handling any requests). You should now have all traffic past the reverse proxy going via the new environment.

* Update DNS to remove the old environment reverse proxy. Wait for all traffic to stop hitting the old reverse proxy.

* When you're confident everything is fine, you can disable the old environment and bring DNS TTL back up.

其實這個方法跟雲端沒什麼關係,以前搞機房搬遷的時候應該都會規劃過類似的方案,大方向也都類似 (把 stateful services 與 stateless services 拆開來分析),只是不像雲端的彈性租賃,硬體要準備比較多...

我記得當年 Instagram 搬進 Facebook 機房的時候也有類似的計畫,之前有提過:「Instagram 從 AWS 搬到 Facebook 機房」。

台灣最近的話,好像是 PChome 24h 有把機房搬到 GCP 上面?看看他們之後會不會到 GCP 的場子上發表他們搬遷的過程...

把 Snap 包裝成 Flatpak 格式的工具

前幾天看到「unsnap」這個工具,可以把 Snap 套件轉成 Flatpak 套件,不過裡面有提到目前軟體的成熟度還沒有很高:

Let's say it's "Pre-alpha", as in "It kinda works on my computer".

但看起來會是個可以玩看看的東西,目前 Flatpak 的市場份額的確是愈吃愈多...

Facebook 把自家的 MySQL 升級到 8.0

Facebook Engineering 發了一篇將 MySQL 升級到 8.0 的說明:「Migrating Facebook to MySQL 8.0」。

先前的版本主要是 5.6,加上 MyRocks

Our last major version upgrade, to MySQL 5.6, took more than a year to roll out. When version 5.7 was released, we were still in the midst of developing our LSM-Tree storage engine, MyRocks, on version 5.6. Since upgrading to 5.7 while simultaneously building a new storage engine would have significantly slowed the progress on MyRocks, we opted to stay with 5.6 until MyRocks was complete. MySQL 8.0 was announced as we were finishing the rollout of MyRocks to our user database (UDB) service tier.

GitHub 上是有 facebook/mysql-8.0,但看起來從 2017 後就沒更新了,所以應該是沒有 open source 出來。

看看就好 XD

所以雙方都公開承認 Microsoft 併購 GitHub 了...

MicrosoftGitHub 兩邊的新聞稿都出來了:「Microsoft to acquire GitHub for $7.5 billion」、「A bright future for GitHub」。

隔壁棚 GitLab 在前幾天有消息時就先恭賀了 (畢竟同個業界的,可以驗證消息的來源比我們多):

另外也馬上就提供 migration 促銷:

然後從 GitLab 的 GitHub Importer (Grafana) 上面也可以看到湧入大量的 GitHub 使用者 (這個站的流量太大,圖表有時候會出不來),可以看出不少人搬家... 不過我覺得這只是搬到另外一個坑啊。

我是比較正面看待這件事情... Microsoft 遲早會搞爛 GitHub,然後 Git 逐漸回歸分散式的本質,而不是現在 GitHub 這樣高度集中。

一路從 MySQL 5.5 升級到 MySQL 8.0 的故事...

在「Migrating to MySQL 8.0 without breaking old application」這邊看到這個有趣的故事 XD 這是作者的應用程式 DrupalMySQL 5.5 一路升級到 8.0 的過程記錄...

真正的問題發生在 5.7 到 8.0:

原因是 Drupal 用到關鍵字了:

In fact, this old Drupal, uses a table name that is now part of the reserved keywords. It’s always advised to verify what are the new keywords reserved for MySQL itself. New features can also mean new keywords sometimes.

修正後就好了:

話說依照「File:Drupal release timeline.png」這邊的資訊,Drupal 6.2 也十年左右了?應該是 PDO 剛開始要推廣的年代,不知道他跑哪個版本的 PHP...

另外 MySQL 的升級意外的順利?雖然是一步一步升,但沒遇到什麼大問題...

Microsoft 虛擬化的兩個消息:Azure 被打臉以及 AWS 推出轉移工具

首先是 VMware 發文打臉 Microsoft 說他們所宣稱的轉移工具 (從 VMware 轉到 Azure 上) 並沒有 VMware 原廠支援:「VMware – The Platform of Choice in the Cloud」。

然後 AWS 則是推出了從 Hyper-V 轉移到 AWS 的工具:「Migrate Hyper-V VMs to AWS with AWS Server Migration Service」,這邊倒是沒提到官方支援...

這臉不只是腫腫的而已了,有種連續技的感覺 XD

AWS 推出將 Classic Load Balancer 轉換成 Application Load Balancer 或 Network Load Balancer 的功能

AWS 應該是希望大家趕快把能換的 ELB 都換成新的 ALB 與 NLB,所以推出這個功能:「New One-step Migration Wizard to Migrate a Classic Load Balancer」。

Today, AWS announced the ability to migrate from a Classic Load Balancer to an Application Load Balancer or a Network Load Balancer in one step using a console-based migration wizard.

不過 ELB 有一些功能是目前 ALB 與 NLB 沒有的,像是一般性的 SSL offload (而非 HTTPS offload),以及舊客戶會有的 EC2-Classic:

不知道這兩個問題之後會推出什麼樣的方案解決...