Home » Computer » Software » Archive by category "Database"

KPTI (Meltdown Mitigation) 對 MyISAM 的痛點

MariaDB 的「MyISAM and KPTI – Performance Implications From The Meltdown Fix」這篇看到頗驚人的數字,這篇提到了他們收到回報 (回報的 ticket 可以參考「[MDEV-15072] Massive performance impact after PTI fix - JIRA」),說 KPTI (Meltdown Mitigation) 對 MyISAM 效能影響巨大:

Recently we had a report from a user who had seen a stunning 90% performance regression after upgrading his server to a Linux kernel with KPTI (kernel page-table isolation – a remedy for the Meltdown vulnerability).

他們發現 90% 是因為 VMware 舊版本無法使用 CPU feature 加速,在新版應該可以改善不少。但即使如此,文章內還是在實體機器上看到了 40% 的效能損失:

A big deal of those 90% was caused by running in an old version of VMware which doesn’t pass the PCID and INVPCID capabilities of the CPU to the guest. But I could reproduce a regression around 40% even on bare metal.

然後後面就在推銷 MariaDB 的 Aria Storage Engine 了,不是那麼重要... 不過知道 MyISAM 在 KPTI 下這麼傷還蠻重要的,因為接下來五年應該都還是愈的到 KPTI,應該還是有人在用 MyISAM...

用 Percona Monitoring and Management (PMM) 蒐集 PostgreSQL 的數據

難得在 Percona 的 blog 上看到專門談 PostgreSQL 的文章:「Collect PostgreSQL Metrics with Percona Monitoring and Management (PMM)」。

其實是透過 Prometheus 疊出來的:

Starting from PMM 1.4.0. it’s possible to add monitoring for any service supported by Prometheus.


3. In the next dialog, choose Prometheus as a data source and continue.


DynamoDB 可以透過 KMS 加密了...

AWSDynamoDB 可以透過 KMS 加密了:「New – Encryption at Rest for DynamoDB」。

You simply enable encryption when you create a new table and DynamoDB takes care of the rest. Your data (tables, local secondary indexes, and global secondary indexes) will be encrypted using AES-256 and a service-default AWS Key Management Service (KMS) key.

看起來不是自己的 KMS key,而是 service 本身提供的,這樣看起來是在 i/o level 加密,所以還不是 searchable encryption 的能力...

AWS Tokyo 也有 Amazon Aurora (PostgreSQL) 可以用了

剛剛翻到 AWS 宣佈 Amazon Aurora (PostgreSQL) 在東京開放使用了:「Amazon Aurora with PostgreSQL Compatibility is Available in the Asia Pacific (Tokyo) Region」。

The PostgreSQL-compatible edition of Amazon Aurora is now available in 10 regions. With the addition of the AWS Asia Pacific (Tokyo) region, you have a new option for database placement, availability, and scalability.

不過 Region Table 裡面還沒更新,亞洲區裡面的東京還沒勾起來,應該過幾天就會更新了...

Amazon Aurora (MySQL) 推出相容於 MySQL 5.7 的版本

Amazon Aurora (MySQL) 推出相容於 MySQL 5.7 的版本了:「Amazon Aurora is Compatible with MySQL 5.7」。


Amazon Aurora is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The MySQL-compatible edition of Aurora delivers up to 5X the throughput of standard MySQL running on the same hardware, and is designed to be compatible with MySQL 5.6, enabling existing MySQL applications and tools to run without requiring modification.

5.7 其中一個賣點在於支援 Spatial index (透過 R-tree),不過 AWS 的版本看起來是自己用 B-tree 加上 Z-order curve 實做:「Amazon Aurora under the hood: indexing geospatial data using Z-order curves」。

我覺得看看就好啦,拿 244GB RAM 的 r3.8xlarge 跑 1GB 的 data set,當大家是傻逼嗎...

Percona Server 引入 MyRocks

看到「MyRocks Engine: Things to Know Before You Start」這篇,才發現原來一月的時候 Percona Server 就已經將 MyRocks GA (General Availability) 了:「Percona Server for MySQL 5.7.20-19 Is Now Available」。

New Features:
Now MyRocks Storage Engine has General Availability status.


Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb.

另外文章裡也提到了重要的差異 (在「What other differences should you be aware of?」這段),像是他並不是每個 table 都一個檔案,而是像早期 InnoDB 的作法,整個一包:

Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation.

另外提到可以看到 Engine 的代碼是 ROCKSDB (從 ENGINE=ROCKSDB 那段看到的)。然後是 Isolation level 的支援度,只有 READ-COMMITTEDSERIALIZABLE,沒有 REPEATABLE-READ

Keep in mind that at this time MyRocks supports only READ-COMMITTED and SERIALIZABLE isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this.

然後 bulk load 在資料量超過記憶體大小時會有已知的 crash 問題:

For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data.

然後目前沒有像 XtraBackup 的工具可以用,現階段如果要備份的話得透過傳統的方式來做 (mysqldump 或是 filesystem snapshot):

Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS.


Amazon Aurora (MySQL) 提供 Parallel Query 讓人申請使用

AWS 宣佈了 Amazon Aurora (MySQL) 支援 Parallel Query:「Amazon Aurora Parallel Query is Available for Preview」。

這邊提到的 Parallel Query 比較像是 Amazon Athena,直接把單一 Query 打散到多台機器上跑:

Amazon Aurora Parallel Query improves the performance of large analytic queries by pushing processing down to the Aurora storage layer, spreading processing across hundreds of nodes.

也就是說,這算是單一 SQL Query 平行運算的進階版本。

在這之前,AWS 都已經支援單一 Query 在單台機器上利用多 CPU 平行運算。其中 PostgreSQL 是 9.6+ 本身就有支援。Amazon Aurora (MySQL) 則是在 2016 時透過 Parallel Read Ahead 支援某些情境下的的單一 Query 多 CPU 運算了 (發現之前沒寫到...):「Amazon Aurora Update – Parallel Read Ahead, Faster Indexing, NUMA Awareness」。

這個功能目前是 Preview 階段,然後開在這些地區讓大家測試使用:

The preview is available for the MySQL-compatible edition of Amazon Aurora, and is currently available in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions. Sign up to get access.


EnterpriseDB 打算推出的 zheap,想要解 VACUUM 問題...

前天被問到「DO or UNDO - there is no VACUUM」這篇,回家後仔細看一看再翻了一些資料,看起來是要往 InnoDB 的解法靠...

PostgreSQL 與 InnoDB 都是透過 MVCC 的概念實做 transaction 之間的互動,但兩者實際的作法不太一樣。其中帶來一個明顯的差異就是 PostgreSQL 需要 VACUUM。這點在同一篇作者八年前 (2011) 的文章就有提過兩者的差異以及優缺點:「MySQL vs. PostgreSQL, Part 2: VACUUM vs. Purge」。

UPDATE 時,InnoDB 會把新資料寫到表格內,然後把可能會被 rollback 的舊資料放到表格外:

In InnoDB, only the most recent version of an updated row is retained in the table itself. Old versions of updated rows are moved to the rollback segment, while deleted row versions are left in place and marked for future cleanup. Thus, purge must get rid of any deleted rows from the table itself, and clear out any old versions of updated rows from the rollback segment.

而被 DELETE 清除的資料則是由 purge thread 處理:

All the information necessary to find the deleted records that might need to be purged is also written to the rollback segment, so it's quite easy to find the rows that need to be cleaned out; and the old versions of the updated records are all in the rollback segment itself, so those are easy to find, too.

所以可以在 InnoDB 看到 purge thread 相關的設定:「MySQL :: MySQL 5.7 Reference Manual :: 14.6.11 Configuring InnoDB Purge Scheduling」,負責處理這些東西。

而在 PostgreSQL 的作法則是反過來,舊的資料放在原來地方,新資料另外存:

PostgreSQL takes a completely different approach. There is no rollback tablespace, or anything similar. When a row is updated, the old version is left in place; the new version is simply written into the table along with it.


Lacking a centralized record of what must be purged, PostgreSQL's VACUUM has historically needed to scan the entire table to look for records that might require cleanup.

這也使得 PostgreSQL 裡需要 autovacuum 之類的程序去掃,或是手動跑 vacuum。而在去年 (2017) 的文章裡也有提到目前還是類似的情況:「MVCC and VACUUM」。

而在今年 (2018) 的文章裡,EnterpriseDB 就提出了 zheap 的想法,在 UPDATE 時寫到 table 裡,把可能被 rollback 的資料放到 undo log 裡。其實就是把 InnoDB 那套方法拿過來用,只是整篇都沒提到而已 XD:

That brings me to the design which EnterpriseDB is proposing. We are working to build a new table storage format for PostgreSQL, which we’re calling zheap. In a zheap, whenever possible, we handle an UPDATE by moving the old row version to an undo log, and putting the new row version in the place previously occupied by the old one. If the transaction aborts, we retrieve the old row version from undo and put it back in the original location; if a concurrent transaction needs to see the old row version, it can find it in undo. Of course, this doesn’t work when the block is full and the row is getting wider, and there are some other problem cases as well, but it covers many useful cases. In the typical case, therefore, even bulk updates do not force a zheap to grow. Instead, the undo grows. When a transaction commits, all row versions that will become dead are in the undo, not the zheap.

不過馬上就會想到問題,如果要改善問題,不是個找地方記錄哪些位置要回收就好了嗎?順便改變方法是為了避免 fragment 嗎?


Percona 版本的 MySQL 對於 Meltdown/Spectre 漏洞修復造成的效能損失 (Intel 平台)


PerconaUbuntu 16.04 上測試 MeltdownSpectre 這兩個安全漏洞的修正對於效能的影響。在原文標題就講了結論,為了修正 Meltdown 與 Spectre 兩個安全漏洞,效能的損失很明顯:「20-30% Performance Hit from the Spectre Bug Fix on Ubuntu」。

這邊測的結果發現,在 CPU bound 時的損失大約是 20%~25% (甚至到 30%),而 I/O bound 會輕一些,大約是 15%~20%:

We can see that in CPU-bound workloads the overhead is 20-25%, reaching up to 30% in point select queries. In IO-bound (25G buffer pool) workloads, the observed overhead is 15-20%.

在 comment 的地方 Percona 的人被問到 AMD 平台上效能會損失多少的問題,但因為他們手上目前沒有 AMD 平台的新機器所以不知道會有多少:

I do not have modern AMD servers on my hands right now

理論上 AMD 平台不需要處理 Meltdown 問題,損失應該會少一些,但沒測過也不曉得會是什麼情況... (像是 Spectre 的修正損失會不會比 Intel 還重,這之類的...)

另外補上早些時候的文章,當時 Ubuntu 上的 kernel 只有對 Meltdown 攻擊的修正,當時 Percona 的人也測了一次:「Does the Meltdown Fix Affect Performance for MySQL on Bare Metal?」,看起來對 Meltdown 攻擊的修正對效能的影響不太大,不過文裡有測試到 syscall 的效率的確如同預期掉很多。