Percona XtraDB Cluster 裡各種與 LOCK 相關的指令會產生的效果

在「FLUSH and LOCK Handling in Percona XtraDB Cluster」這邊看到在 Percona XtraDB Cluster 內各種不同形式的 LOCK 指令會有不同的效果。有些跟一開始用的印象已經不太一樣了...

FLUSH TABLE WITH READ LOCKFLUSH TABLE <tablename> (WITH READ LOCK|FOR EXPORT) 都會直接讓整個 node 卡住,但 LOCK TABLE <tablename> READ/WRITE 就只會卡對應的表格,另外 GET_LOCK 本來應該是完全不支援,現在似乎變成 experimental 的功能了 (參考「PXC Strict Mode」這邊),這樣一來 MogileFS 的資料庫部分就可以在上面跑了嗎?(當初就是因為這個問題而另外弄一組 DRBD + HeartbeatMySQL 起來跑 XD)

之後看一下什麼時候加進去的...

一路從 MySQL 5.5 升級到 MySQL 8.0 的故事...

在「Migrating to MySQL 8.0 without breaking old application」這邊看到這個有趣的故事 XD 這是作者的應用程式 DrupalMySQL 5.5 一路升級到 8.0 的過程記錄...

真正的問題發生在 5.7 到 8.0:

原因是 Drupal 用到關鍵字了:

In fact, this old Drupal, uses a table name that is now part of the reserved keywords. It’s always advised to verify what are the new keywords reserved for MySQL itself. New features can also mean new keywords sometimes.

修正後就好了:

話說依照「File:Drupal release timeline.png」這邊的資訊,Drupal 6.2 也十年左右了?應該是 PDO 剛開始要推廣的年代,不知道他跑哪個版本的 PHP...

另外 MySQL 的升級意外的順利?雖然是一步一步升,但沒遇到什麼大問題...

Amazon RDS 宣佈支援 PostgreSQL 10

Amazon RDS 宣佈支援 PostgreSQL 10 了:「PostgreSQL 10 now Supported in Amazon RDS」。而且 AWS 這次推出的還包括了 10.1 的 patch:

As of version 10, PostgreSQL no longer uses three-part version numbers, and is shifting to two-part version numbers. This release includes all patches from the PostgreSQL 10.1 minor version.

10 的第一個版本是去年十月初 (在「PostgreSQL 10 Released」這邊可以看到),10.1 是去年十一月初 (在「PostgreSQL 10.1, 9.6.6, 9.5.10, 9.4.15, 9.3.20, and 9.2.24 released!」),現在二月底,所以延遲大約是三個多月的時間...

10.2 是二月初,不知道會多久...

用 Percona Monitoring and Management (PMM) 蒐集 PostgreSQL 的數據

難得在 Percona 的 blog 上看到專門談 PostgreSQL 的文章:「Collect PostgreSQL Metrics with Percona Monitoring and Management (PMM)」。

其實是透過 Prometheus 疊出來的:

Starting from PMM 1.4.0. it’s possible to add monitoring for any service supported by Prometheus.

在步驟也可以看到:

3. In the next dialog, choose Prometheus as a data source and continue.

這方法有點奇怪就是了,但反正會動比較重要?XD

AWS Tokyo 也有 Amazon Aurora (PostgreSQL) 可以用了

剛剛翻到 AWS 宣佈 Amazon Aurora (PostgreSQL) 在東京開放使用了:「Amazon Aurora with PostgreSQL Compatibility is Available in the Asia Pacific (Tokyo) Region」。

The PostgreSQL-compatible edition of Amazon Aurora is now available in 10 regions. With the addition of the AWS Asia Pacific (Tokyo) region, you have a new option for database placement, availability, and scalability.

不過 Region Table 裡面還沒更新,亞洲區裡面的東京還沒勾起來,應該過幾天就會更新了...

Amazon Aurora (MySQL) 推出相容於 MySQL 5.7 的版本

Amazon Aurora (MySQL) 推出相容於 MySQL 5.7 的版本了:「Amazon Aurora is Compatible with MySQL 5.7」。

不過網站上的介紹還沒更新:

Amazon Aurora is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The MySQL-compatible edition of Aurora delivers up to 5X the throughput of standard MySQL running on the same hardware, and is designed to be compatible with MySQL 5.6, enabling existing MySQL applications and tools to run without requiring modification.

5.7 其中一個賣點在於支援 Spatial index (透過 R-tree),不過 AWS 的版本看起來是自己用 B-tree 加上 Z-order curve 實做:「Amazon Aurora under the hood: indexing geospatial data using Z-order curves」。

我覺得看看就好啦,拿 244GB RAM 的 r3.8xlarge 跑 1GB 的 data set,當大家是傻逼嗎...

Percona Server 引入 MyRocks

看到「MyRocks Engine: Things to Know Before You Start」這篇,才發現原來一月的時候 Percona Server 就已經將 MyRocks GA (General Availability) 了:「Percona Server for MySQL 5.7.20-19 Is Now Available」。

New Features:
Now MyRocks Storage Engine has General Availability status.

在二月這篇文章裡面有提到一些重點,像是安裝方式:

Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb.

另外文章裡也提到了重要的差異 (在「What other differences should you be aware of?」這段),像是他並不是每個 table 都一個檔案,而是像早期 InnoDB 的作法,整個一包:

Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation.

另外提到可以看到 Engine 的代碼是 ROCKSDB (從 ENGINE=ROCKSDB 那段看到的)。然後是 Isolation level 的支援度,只有 READ-COMMITTEDSERIALIZABLE,沒有 REPEATABLE-READ

Keep in mind that at this time MyRocks supports only READ-COMMITTED and SERIALIZABLE isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this.

然後 bulk load 在資料量超過記憶體大小時會有已知的 crash 問題:

For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data.

然後目前沒有像 XtraBackup 的工具可以用,現階段如果要備份的話得透過傳統的方式來做 (mysqldump 或是 filesystem snapshot):

Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS.

想來找機會測試看看兩者差異...

Amazon Aurora (MySQL) 提供 Parallel Query 讓人申請使用

AWS 宣佈了 Amazon Aurora (MySQL) 支援 Parallel Query:「Amazon Aurora Parallel Query is Available for Preview」。

這邊提到的 Parallel Query 比較像是 Amazon Athena,直接把單一 Query 打散到多台機器上跑:

Amazon Aurora Parallel Query improves the performance of large analytic queries by pushing processing down to the Aurora storage layer, spreading processing across hundreds of nodes.

也就是說,這算是單一 SQL Query 平行運算的進階版本。

在這之前,AWS 都已經支援單一 Query 在單台機器上利用多 CPU 平行運算。其中 PostgreSQL 是 9.6+ 本身就有支援。Amazon Aurora (MySQL) 則是在 2016 時透過 Parallel Read Ahead 支援某些情境下的的單一 Query 多 CPU 運算了 (發現之前沒寫到...):「Amazon Aurora Update – Parallel Read Ahead, Faster Indexing, NUMA Awareness」。

這個功能目前是 Preview 階段,然後開在這些地區讓大家測試使用:

The preview is available for the MySQL-compatible edition of Amazon Aurora, and is currently available in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions. Sign up to get access.

這個功能提供了想要提昇效能,但懶得改架構的人可以用錢直接硬換出來...

EnterpriseDB 打算推出的 zheap,想要解 VACUUM 問題...

前天被問到「DO or UNDO - there is no VACUUM」這篇,回家後仔細看一看再翻了一些資料,看起來是要往 InnoDB 的解法靠...

PostgreSQL 與 InnoDB 都是透過 MVCC 的概念實做 transaction 之間的互動,但兩者實際的作法不太一樣。其中帶來一個明顯的差異就是 PostgreSQL 需要 VACUUM。這點在同一篇作者八年前 (2011) 的文章就有提過兩者的差異以及優缺點:「MySQL vs. PostgreSQL, Part 2: VACUUM vs. Purge」。

UPDATE 時,InnoDB 會把新資料寫到表格內,然後把可能會被 rollback 的舊資料放到表格外:

In InnoDB, only the most recent version of an updated row is retained in the table itself. Old versions of updated rows are moved to the rollback segment, while deleted row versions are left in place and marked for future cleanup. Thus, purge must get rid of any deleted rows from the table itself, and clear out any old versions of updated rows from the rollback segment.

而被 DELETE 清除的資料則是由 purge thread 處理:

All the information necessary to find the deleted records that might need to be purged is also written to the rollback segment, so it's quite easy to find the rows that need to be cleaned out; and the old versions of the updated records are all in the rollback segment itself, so those are easy to find, too.

所以可以在 InnoDB 看到 purge thread 相關的設定:「MySQL :: MySQL 5.7 Reference Manual :: 14.6.11 Configuring InnoDB Purge Scheduling」,負責處理這些東西。

而在 PostgreSQL 的作法則是反過來,舊的資料放在原來地方,新資料另外存:

PostgreSQL takes a completely different approach. There is no rollback tablespace, or anything similar. When a row is updated, the old version is left in place; the new version is simply written into the table along with it.

新舊資料的位置其實還好,主要是因為沒有類似的地方可以記錄哪些要清:

Lacking a centralized record of what must be purged, PostgreSQL's VACUUM has historically needed to scan the entire table to look for records that might require cleanup.

這也使得 PostgreSQL 裡需要 autovacuum 之類的程序去掃,或是手動跑 vacuum。而在去年 (2017) 的文章裡也有提到目前還是類似的情況:「MVCC and VACUUM」。

而在今年 (2018) 的文章裡,EnterpriseDB 就提出了 zheap 的想法,在 UPDATE 時寫到 table 裡,把可能被 rollback 的資料放到 undo log 裡。其實就是把 InnoDB 那套方法拿過來用,只是整篇都沒提到而已 XD:

That brings me to the design which EnterpriseDB is proposing. We are working to build a new table storage format for PostgreSQL, which we’re calling zheap. In a zheap, whenever possible, we handle an UPDATE by moving the old row version to an undo log, and putting the new row version in the place previously occupied by the old one. If the transaction aborts, we retrieve the old row version from undo and put it back in the original location; if a concurrent transaction needs to see the old row version, it can find it in undo. Of course, this doesn’t work when the block is full and the row is getting wider, and there are some other problem cases as well, but it covers many useful cases. In the typical case, therefore, even bulk updates do not force a zheap to grow. Instead, the undo grows. When a transaction commits, all row versions that will become dead are in the undo, not the zheap.

不過馬上就會想到問題,如果要改善問題,不是個找地方記錄哪些位置要回收就好了嗎?順便改變方法是為了避免 fragment 嗎?

等著看之後變成什麼樣子吧...

Percona 版本的 MySQL 對於 Meltdown/Spectre 漏洞修復造成的效能損失 (Intel 平台)

而且這還不是完全修復,只是大幅降低被攻擊的機率...

PerconaUbuntu 16.04 上測試 MeltdownSpectre 這兩個安全漏洞的修正對於效能的影響。在原文標題就講了結論,為了修正 Meltdown 與 Spectre 兩個安全漏洞,效能的損失很明顯:「20-30% Performance Hit from the Spectre Bug Fix on Ubuntu」。

這邊測的結果發現,在 CPU bound 時的損失大約是 20%~25% (甚至到 30%),而 I/O bound 會輕一些,大約是 15%~20%:

We can see that in CPU-bound workloads the overhead is 20-25%, reaching up to 30% in point select queries. In IO-bound (25G buffer pool) workloads, the observed overhead is 15-20%.

在 comment 的地方 Percona 的人被問到 AMD 平台上效能會損失多少的問題,但因為他們手上目前沒有 AMD 平台的新機器所以不知道會有多少:

I do not have modern AMD servers on my hands right now

理論上 AMD 平台不需要處理 Meltdown 問題,損失應該會少一些,但沒測過也不曉得會是什麼情況... (像是 Spectre 的修正損失會不會比 Intel 還重,這之類的...)

另外補上早些時候的文章,當時 Ubuntu 上的 kernel 只有對 Meltdown 攻擊的修正,當時 Percona 的人也測了一次:「Does the Meltdown Fix Affect Performance for MySQL on Bare Metal?」,看起來對 Meltdown 攻擊的修正對效能的影響不太大,不過文裡有測試到 syscall 的效率的確如同預期掉很多。