Database – Page 2 – Gea-Suan Lin's BLOG

AWS 推出 RDS for Db2 以及 Amazon Aurora Limitless Database

資料庫的部分，看到 AWS 在這次 re:Invent 放出「Getting started with new Amazon RDS for Db2」與「Join the preview of Amazon Aurora Limitless Database」這兩則消息。

首先是看到 Db2 這個詞覺得怪，查了以後發現原來是 2017 年改過名字正式的拼法：

The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

這個消息比較像是補產品線，最大的資訊其實是查資料知道 2017 年改成 Db2...

另外的新產品 Amazon Aurora Limitless Database 就比較特別了，我翻了一下先前的 Amazon Aurora Serverless，是屬於利用 read replica 拓展「讀」的承載能力：

Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora. It automatically starts up, shuts down, and scales capacity up or down based on your application's needs.

這次在 Amazon Aurora Limitless Database 裡面直接提到 sharding 技巧了，這就有機會拓展「寫」的承載能力：

Shards are Aurora PostgreSQL DB instances that each store a subset of the data for your database, allowing for parallel processing to achieve higher write throughput. Transaction routers manage the distributed nature of the database and present a single database image to database clients.

這次 preview 推出的是 PostgreSQL 15 的版本：

The preview runs in a new Aurora PostgreSQL cluster with version 15 in the AWS US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland) Regions.

這代表 database schema 不要設計的太差，是蠻有機會讓 RDS 幫你處理 sharding 底層所有需要的細節？等後續價錢出來可以看看... (應該是 open/public preview 或是 open beta 的階段)

PostgreSQL 的 Logical Replication 還有很多限制...

雖然之前提過很多次 PostgreSQL 的 logical replication，但最近總算是有空實際架設起來測試，發現目前的 logical replication 還在進化的過程，只能算是階段性的產品。

在 PostgreSQL 16 的「31.6. Restrictions」裡面有列出了目前 logical replication 的限制。

第一條其實是最痛的，不支援各種 DDL 操作，所以像是 CREATE TABLE 或是 ALTER TABLE 都不會同步，這牽扯到 DBOps 的動作需要配合，DB schema 的改變會變得很詭異，需要 case by case 處理，甚至 application 端可能也會需要配合。

The database schema and DDL commands are not replicated.

另外一個頭痛的點是 sequence 資料居然不會同步，這個工具常被用到 SERIAL 類的設計 (雖然 SERIAL 被 deprecated 了)，這代表當偵測到 master 掛掉時無法直接 failover，除非有另外處理 sequence 的資料：

Sequence data is not replicated.

翻了資料發現官方 wiki 上有「Logical replication of DDLs」，裡面有今年六月的投影片：「Logical Replication of DDLs」，看起來 DDL 的部分有已經 patch 丟出來 (對 PostgreSQL 15 的 patch)，但看了 PostgreSQL 16 的 release notes 裡面還沒看到，看起來還要等...

所以 logical replication 看起來還在演進的過程，目前的限制使得 logical replication 還不到能用的成熟度。

可以繼續觀察看看...

讓 git diff 可以直接顯示 SQLite3 裡面的差異

從「Tracking SQLite Database Changes in Git」這邊看到的，然後作者 Simon Willison 又是從 Lobste.rs 的「Tracking SQLite Database Changes in Git databases」這邊看到的，而原文在「Tracking SQLite Database Changes in Git」。

一般 SQLite 檔案的 diff 會出現這樣：

diff --git a/a.sqlite3 b/a.sqlite3
index a4a8cfa..714f34a 100644
Binary files a/a.sqlite3 and b/a.sqlite3 differ

作者想要透過 sqlite3 的指令加工，讓 git-diff 的演算法可以展現出像是這樣的指令：

diff --git a/a.sqlite3 b/a.sqlite3
index a4a8cfa..714f34a 100644
--- a/a.sqlite3
+++ b/a.sqlite3
@@ -3,4 +3,5 @@ BEGIN TRANSACTION;
 CREATE TABLE tbl (id SERIAL, username TEXT, password TEXT, created_at INT, updated_at INT);
 INSERT INTO tbl VALUES(NULL,'gslin','$1$yRgoNPev$nOc5Hpr5JZAYISbHjp7LA/',0,0);
 INSERT INTO tbl VALUES(NULL,'dk','$1$yRgoNPev$nOc5Hpr5JZAYISbHjp7LA/',0,0);
+INSERT INTO tbl VALUES(NULL,'darkkiller','$1$yRgoNPev$nOc5Hpr5JZAYISbHjp7LA/',0,0);
 COMMIT;

方法是先設定 diff 工具部分，這個可以放到 ~/.gitconfig 裡面：

[diff "sqlite3"]
    binary = true
    textconv = "echo '.dbconfig trusted_schema no\n.dump' | sqlite3"

這邊跟原文不太一樣，主要是參考了 SQLite 官方網站上「Defense Against The Dark Arts」這邊的「Untrusted SQLite Database Files」部分，增加了 .dbconfig trusted_schema no 的設定加減擋一下...

然後我依照「Where should I place my global 'gitattributes' file?」這邊的問題與解答，把 attributes 設定放在 ~/.config/git/attributes 裡面：

*.sqlite diff=sqlite3
*.sqlite3 diff=sqlite3

這樣就會對這個使用者所有的 git repository 都生效。

作者原文提到的方法也可以用，不過主要是在單一 repository 裡面設定，針對 *.db 這類只有在 repository 內才會知道規則的告訴 git 要怎麼認。

Amazon RDS 的 TLS 連線所使用的 CA 要更新了

Amazon RDS 的 TLS (SSL) 連線所使用的 CA 要更新了：「Rotate Your SSL/TLS Certificates Now – Amazon RDS and Amazon Aurora Expire in 2024」。

如果沒有開 TLS 連線的話是不受影響 (像是內網裸奔)，但如果有在用 TLS 的話就要注意一下了，看起來得手動更新處理。

比較特別的是新的 CA 簽的超長：

Most SSL/TLS certificates (rds-ca-2019) for your DB instances will expire in 2024 after the certificate update in 2020. In December 2022, we released new CA certificates that are valid for 40 years (rds-ca-rsa2048-g1) and 100 years (rds-ca-rsa4096-g1 and rds-ca-ecc384-g1). So, if you rotate your CA certificates, you don’t need to do It again for a long time.

現有的 rds-ca-2019 可以在 https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem 這邊取得，用 openssl x509 -in rds-ca-2019-root.pem -text 可以看到資料。

到 crt.sh 上翻過一些字串，沒看到被簽的記錄，所以看起來無法透過一般 trusted store 裡面的 Root CA 一路信任下來。

新的 key 應該也是 Private Root CA，從名字看起來應該是對應的 key algorithm。其中 RSA 2048 的簽了 40 年，而 RSA 4096 與 ECC 384 的簽了 100 年，雖然說是自家弄的 CA，但目前的 compliance 沒有要求 key rotation 嗎...

Anyway，常用的區域基本上都是 August 22, 2024 這個日期，大約還有九個多月的時間更新，依照 AWS 的慣例，後面應該還會提醒幾次：

話說 2020 年的時候也有更新，當時是 Jeff Barr 出來說明的：「Urgent & Important – Rotate Your Amazon RDS, Aurora, and Amazon DocumentDB (with MongoDB compatibility) Certificates」，現在看起來一些常態性的說明都陸續交棒給 Channy Yun 了...

不過這次這樣搞 40 年 & 100 年，後續要更新應該都是演算法的推進了，比較不會是要到期...

MariaDB 停掉 SkySQL 與 Xpand

看到「MariaDB ditches products and staff in restructure, bags $26.5M loan to cushion fall」這篇，MariaDB 決定停掉 SkySQL (SQL service) 與 Xpand (scalable solution)。對應的 8-K 文件在 30366574-a5f7-4512-ab57-e643d1fb148d.pdf 這邊。

這兩個產品不再收新客戶，另外現有的客戶有安排轉移計畫 (大概就是告知替代方案而已？)：

As part of the Plan, the Company will focus its attention on its core MariaDB Enterprise Server database product. Products not related to the core MariaDB Enterprise Server business, including SkySQL and Xpand, will no longer be sold and the Company has implemented a plan to help existing customers migrate off these products.

至於本業的部分... 自家研發的 Aria 引擎沒有什麼賣點，與 MyISAM 相比，只有 crash-safe 的功能能看，但還是沒有 transaction 的能力。

這次是不是還有能量回歸把本業做好，就得看下去了... 話說 MariaDB 當初 2022 年是透過 SPAC 搞上市的 (參考「The MariaDB SPAC deal could prove to be a key test for unicorn exits」這個報導)，不知道 key man 股票是不是都脫手了...

Percona XtraDB Cluster (PXC) 的感想

看到「Percona XtraDB Cluster 是 MySQL 的叢集與分散式解決方案」這篇，裡面提到了 Percona 包的 Galera Cluster，叫 Percona XtraDB Cluster。

Percona 算是把 Galera Cluster 包的比較好的 distribution，是還蠻建議直接用他們家的版本。另外我記得 MariaDB 也有包一個版本，叫做 MariaDB Galera Cluster。

這篇算是很早期使用 PXC 的人的一些感想：(大概是 2012 年導入，當年雲端也還沒流行，在地端上面自己建，對應的 MySQL 底層還是 5.5 的年代)

Percona XtraDB Cluster 建議至少三台

Galera Cluster 的三台可以是兩台有資料的，加上一台沒有資料，這台沒資料的只負責投票組成 quorum，不需要到三台都是大機器，而且這樣的配置也比較單純一點。

另外兩台雖然都可以當 writer 寫入，但實務上會建議都集中在一台寫，這樣可以大幅降低跨機器時產生的 lock contention。

基於上面這個因素，將兩台有資料的機器，一台做 writer，另外一台做 reader 算是常見的架構，然後把可以接受些許 replication lag 的應用 (像是什麼 BI 專用 DB server) 用傳統的 MySQL logical replication 掛出去 (標準的 master-slave 架構，或是後來政治改名為 source-replica 架構)，不要直接參與 Galera Cluster 協定。

(MySQL 5.5 的時候還得自己處理當 master/source 切換時 replication binlog position 的問題，現在有 GUID 後會好一些)

除了 Galera Cluster 外，另外一種方式 (也是比較傳統的方式) 是 active-standby 的方式跑 DRBD：因為 DRBD 可以在兩台機器的 block 層做 mirror，所以切換的時候另外一台機器只要跑 journaling filesystem recovery (像是當年比較流行的 XFS 或是後來主力的 ext4) + InnoDB recovery 就可以跑起來。

DRBD 的老方法架構很單純，維護成本也很低，但缺點就是 recovery 的時間會高一些：在 crash 的 case 下可以做到十分鐘的 downtime 切換 (在傳統磁頭硬碟組成的 RAID)，而 Galera Cluster 因為等於是 hot-standby，蠻容易就可以做到小於 30 秒。

另外在切換後 warmup 的時間上，Galera Cluster 也是因為 hot-standby 大勝：DRBD 這邊的情境等於是 cold start，資料庫內還有很多東西還沒進到 InnoDB buffer，對應的 SQL query 還不會快。

相比起來 Galera Cluster 看起來是個好東西，但後面運作的機制複雜不少 (而且需要有人維護)，公司如果有專門的 DBOps 會比較好...

不過現在 SSD 變成主流的情況，讀取速度與 random access 的效率都快很多，這使得 DRBD 切換的成本低很多了，很有機會整個 downtime (切換 + warmup) 是五分鐘內搞定，如果這個時間是可以接受的，用 Galera Cluster 的優點可能就沒那麼高了...

PostgreSQL 的 meme 與對應的解釋

看到「Explaining The Postgres Meme」這篇，很努力在解釋 PostgreSQL 的 meme：

這份 meme 的下半部意外的把很多常見的問題都涵蓋進去了，如果要做教材的話好像是個不錯的起點，不過要注意有些不是專屬 PostgreSQL 的 meme 也有被放進去。

裡面有很多都還不熟悉，之前在 MySQL 陣營太久了，很多東西都不會想要讓 MySQL 處理... (或是無法在在 MySQL 處理)

窮舉 PostgreSQL 的 LOCK 機制

PostgreSQL Lock Conflicts 這份資料窮舉了 PostgreSQL 的 lock 機制，分成兩種方式呈現：

依照 Lock：文內列出 12 種 lock。
依照 Command：文內列出了 67 種指令。

可以交叉查，用 lock 查出有哪些 command 有用到，或是反過來用 command 查會產生那些 lock：

This tool shows all commands and locks in postgres. If you select a command, it lists the locks that it acquires, commands that conflicts with it and commands that are allowed to run concurrently with it (with no conflict or blocking). If you select a lock, it lists commands that acquire the lock and what are the other conflicting locks.

舉個例子來說，Lock 列表裡的第一個，AccessShareLock，這個點進去後可以看到有三個指令會有使用到 AccessShareLock 的情境，分別是 SELECT、COPY TO 以及 ALTER TABLE ADD FOREIGN KEY (PARENT)。

是個更熟悉 PostgreSQL 的路徑？

HashiCorp 內 scale 的方法

去日本前在 Hacker News 上看到「Squeeze the hell out of the system you have」這篇，用作者的名字翻了一下 LinkedIn，看起來講的是 HashiCorp 的 SRE 事情：「Dan Slimmon」。

看的時候可以注意一下，文章裡面的觀點未必要認同，大多是他自己的看法或是想法，但裡面提到很多發生的事情，可以知道 HashiCorp 內目前搞了什麼東西。

從 LinkedIn 的資料可以看到他從 2019 就加入 HashiCorp 了，所以文章一開頭這邊講的同事應該就是 HashiCorp 的同事：

About a year ago, I raised a red flag with colleagues and managers about Postgres performance.

往下看可以看到他們有遇到 PostgreSQL 的效能問題，然後每次都是以 scale up (加大機器) 的方式解決，考慮到 HashiCorp 的產品線，我會猜應該是 Terraform Cloud 這個產品線遇到的狀況。

然後在後面提到的解法則是提到了 codebase 是 Rails，他們花了三個月的時候不斷的重複 profiling + optimizing，包括 SQL 與 PostgreSQL 的設定：

Two engineers (me and my colleague Ted – but mostly Ted) spent about 3 months working primarily on database performance issues. There was no silver bullet. We used our telemetry to identify heavy queries, dug into the (Rails) codebase to understand where they were coming from, and optimized or eliminated them. We also tuned a lot of Postgres settings.

另外一組人則是弄了 read-only replication server，把 loading 拆出去：

Two more engineers cut a path through the codebase to run certain expensive read-only queries on a replica DB. This effort bore fruit around the same time as (1), when we offloaded our single most frequent query (a SELECT triggered by polling web clients).

這兩個方法大幅降低了資料庫的 peak loading，從 90% 降到 30%：

These two efforts together reduced the maximum weekly CPU usage on the database from 90% to 30%.

可以看到都還沒用到 sharding 的技巧，目前硬體的暴力程度可以撐很久 (而且看起來是在沒有投入太多資源在 DB-related tuning 上面)，快撞到的時候也還可以先用 $$ 換效能，然後投入人力開始 profiling 找問題...

MySQL 改變發佈的方式，推出 LTS 版本

看到 Percona 寫的「LTS and Innovation Releases for Percona Server for MySQL」這篇，才注意到 Oracle 宣佈了 MySQL 的 LTS 版本：「Introducing MySQL Innovation and Long-Term Support (LTS) versions」。

這次的改變產生了 Innovation 版本與 LTS 版本：

We're transitioning to the new MySQL versioning model with our upcoming versions. MySQL database version 8.1.0 will be our first Innovation release, and 8.0.34+ will transition to only bug fixes until 8.0 End-Of-Life (EOL) scheduled for April-2026. Approximately one year from now, MySQL version 8.x will eventually become LTS which will provide ample time for users to migrate from 8.0.x to the 8.x LTS version.

從這段的解釋看起來是在講從舊的發佈模式到新的發佈模式的轉換期特例，MySQL 8.0.34+ (應該是指 8.0 的後續這條，從 8.0.34 開始) 會支援到 2026 年四月，大約是再兩年半多；而 8.x (應該是指 >8.0 的這條？) 會在距今一年後 (大約是 2024 年年中？) 成為 LTS 版本。

接著的段落拿了一些範例說明：

In practice, in this transition period, if you want the latest features, improvements, and all bug fixes for your MySQL databases, use the Innovation release track (eg., 8.1.x, 8.2.x, 8.3.x, etc.). If you need only bug fixes for your MySQL database, use 8.0.x releases (eg., 8.0.35, 8.0.36, 8.0.37, etc.). In both cases, you should plan to update your MySQL databases quarterly accordingly to Oracle Critical Patch Updates (CPU) calendar. When 8.x becomes LTS, you can plan, test, and migrate from the 8.0.x bug fix track to the LTS release track (ex., from 8.0.37 to 8.4.1).

看起來 8.0 是轉換期的特殊待遇，看起來有點像是 LTS 的 security update only？然後是官方給的這張圖，要注意這張圖下面有提到這張圖只是示意，實際發生的時間點可能會有改變，不過也可以看出來在 8.0 的地位比較特別：

Note that this is an example, there is no commitment that the version numbering will be exactly as presented.

而 LTS 的頻率會是兩年一版，支援 5+3 年，而 8.0 會走四年半：

About every 2 years, a minor version will be designated as Long Term Supported release. This version will have a 5 year premier and 3 year extended support, the same as the previously supported releases. This is similar to MySQL 5.7 and previous releases.

The LTS will also be the last version of the major release. The next (Innovation) release will increase it's major version. For example if MySQL 8.4.0 is the 8.x LTS release, then MySQL 9.0 will be the next Innovation release.

另外後面有提到他們會確保 LTS 在升級時會用到的方式，看了一下沒有太多意外，跟之前在 5.x 年代的感覺差不多。

倒是降級這件事情他們也提出方案，這點就蠻值得看一下了... 然後這邊可以看到 async replication 果然超萬用的。

基本上就是 release cycle 改變的公告，現在這個階段繼續黏在 8.0 上面應該就可以了，後續等第一個 LTS 的消息。