Home » Posts tagged "rdbms" (Page 3)

用 Percona Monitoring and Management (PMM) 蒐集 PostgreSQL 的數據

難得在 Percona 的 blog 上看到專門談 PostgreSQL 的文章:「Collect PostgreSQL Metrics with Percona Monitoring and Management (PMM)」。

其實是透過 Prometheus 疊出來的:

Starting from PMM 1.4.0. it’s possible to add monitoring for any service supported by Prometheus.

在步驟也可以看到:

3. In the next dialog, choose Prometheus as a data source and continue.

這方法有點奇怪就是了,但反正會動比較重要?XD

AWS Tokyo 也有 Amazon Aurora (PostgreSQL) 可以用了

剛剛翻到 AWS 宣佈 Amazon Aurora (PostgreSQL) 在東京開放使用了:「Amazon Aurora with PostgreSQL Compatibility is Available in the Asia Pacific (Tokyo) Region」。

The PostgreSQL-compatible edition of Amazon Aurora is now available in 10 regions. With the addition of the AWS Asia Pacific (Tokyo) region, you have a new option for database placement, availability, and scalability.

不過 Region Table 裡面還沒更新,亞洲區裡面的東京還沒勾起來,應該過幾天就會更新了...

Amazon Aurora (MySQL) 推出相容於 MySQL 5.7 的版本

Amazon Aurora (MySQL) 推出相容於 MySQL 5.7 的版本了:「Amazon Aurora is Compatible with MySQL 5.7」。

不過網站上的介紹還沒更新:

Amazon Aurora is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The MySQL-compatible edition of Aurora delivers up to 5X the throughput of standard MySQL running on the same hardware, and is designed to be compatible with MySQL 5.6, enabling existing MySQL applications and tools to run without requiring modification.

5.7 其中一個賣點在於支援 Spatial index (透過 R-tree),不過 AWS 的版本看起來是自己用 B-tree 加上 Z-order curve 實做:「Amazon Aurora under the hood: indexing geospatial data using Z-order curves」。

我覺得看看就好啦,拿 244GB RAM 的 r3.8xlarge 跑 1GB 的 data set,當大家是傻逼嗎...

Percona Server 引入 MyRocks

看到「MyRocks Engine: Things to Know Before You Start」這篇,才發現原來一月的時候 Percona Server 就已經將 MyRocks GA (General Availability) 了:「Percona Server for MySQL 5.7.20-19 Is Now Available」。

New Features:
Now MyRocks Storage Engine has General Availability status.

在二月這篇文章裡面有提到一些重點,像是安裝方式:

Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb.

另外文章裡也提到了重要的差異 (在「What other differences should you be aware of?」這段),像是他並不是每個 table 都一個檔案,而是像早期 InnoDB 的作法,整個一包:

Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation.

另外提到可以看到 Engine 的代碼是 ROCKSDB (從 ENGINE=ROCKSDB 那段看到的)。然後是 Isolation level 的支援度,只有 READ-COMMITTEDSERIALIZABLE,沒有 REPEATABLE-READ

Keep in mind that at this time MyRocks supports only READ-COMMITTED and SERIALIZABLE isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this.

然後 bulk load 在資料量超過記憶體大小時會有已知的 crash 問題:

For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data.

然後目前沒有像 XtraBackup 的工具可以用,現階段如果要備份的話得透過傳統的方式來做 (mysqldump 或是 filesystem snapshot):

Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS.

想來找機會測試看看兩者差異...

Amazon Aurora (MySQL) 提供 Parallel Query 讓人申請使用

AWS 宣佈了 Amazon Aurora (MySQL) 支援 Parallel Query:「Amazon Aurora Parallel Query is Available for Preview」。

這邊提到的 Parallel Query 比較像是 Amazon Athena,直接把單一 Query 打散到多台機器上跑:

Amazon Aurora Parallel Query improves the performance of large analytic queries by pushing processing down to the Aurora storage layer, spreading processing across hundreds of nodes.

也就是說,這算是單一 SQL Query 平行運算的進階版本。

在這之前,AWS 都已經支援單一 Query 在單台機器上利用多 CPU 平行運算。其中 PostgreSQL 是 9.6+ 本身就有支援。Amazon Aurora (MySQL) 則是在 2016 時透過 Parallel Read Ahead 支援某些情境下的的單一 Query 多 CPU 運算了 (發現之前沒寫到...):「Amazon Aurora Update – Parallel Read Ahead, Faster Indexing, NUMA Awareness」。

這個功能目前是 Preview 階段,然後開在這些地區讓大家測試使用:

The preview is available for the MySQL-compatible edition of Amazon Aurora, and is currently available in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions. Sign up to get access.

這個功能提供了想要提昇效能,但懶得改架構的人可以用錢直接硬換出來...

EnterpriseDB 打算推出的 zheap,想要解 VACUUM 問題...

前天被問到「DO or UNDO - there is no VACUUM」這篇,回家後仔細看一看再翻了一些資料,看起來是要往 InnoDB 的解法靠...

PostgreSQL 與 InnoDB 都是透過 MVCC 的概念實做 transaction 之間的互動,但兩者實際的作法不太一樣。其中帶來一個明顯的差異就是 PostgreSQL 需要 VACUUM。這點在同一篇作者八年前 (2011) 的文章就有提過兩者的差異以及優缺點:「MySQL vs. PostgreSQL, Part 2: VACUUM vs. Purge」。

UPDATE 時,InnoDB 會把新資料寫到表格內,然後把可能會被 rollback 的舊資料放到表格外:

In InnoDB, only the most recent version of an updated row is retained in the table itself. Old versions of updated rows are moved to the rollback segment, while deleted row versions are left in place and marked for future cleanup. Thus, purge must get rid of any deleted rows from the table itself, and clear out any old versions of updated rows from the rollback segment.

而被 DELETE 清除的資料則是由 purge thread 處理:

All the information necessary to find the deleted records that might need to be purged is also written to the rollback segment, so it's quite easy to find the rows that need to be cleaned out; and the old versions of the updated records are all in the rollback segment itself, so those are easy to find, too.

所以可以在 InnoDB 看到 purge thread 相關的設定:「MySQL :: MySQL 5.7 Reference Manual :: 14.6.11 Configuring InnoDB Purge Scheduling」,負責處理這些東西。

而在 PostgreSQL 的作法則是反過來,舊的資料放在原來地方,新資料另外存:

PostgreSQL takes a completely different approach. There is no rollback tablespace, or anything similar. When a row is updated, the old version is left in place; the new version is simply written into the table along with it.

新舊資料的位置其實還好,主要是因為沒有類似的地方可以記錄哪些要清:

Lacking a centralized record of what must be purged, PostgreSQL's VACUUM has historically needed to scan the entire table to look for records that might require cleanup.

這也使得 PostgreSQL 裡需要 autovacuum 之類的程序去掃,或是手動跑 vacuum。而在去年 (2017) 的文章裡也有提到目前還是類似的情況:「MVCC and VACUUM」。

而在今年 (2018) 的文章裡,EnterpriseDB 就提出了 zheap 的想法,在 UPDATE 時寫到 table 裡,把可能被 rollback 的資料放到 undo log 裡。其實就是把 InnoDB 那套方法拿過來用,只是整篇都沒提到而已 XD:

That brings me to the design which EnterpriseDB is proposing. We are working to build a new table storage format for PostgreSQL, which we’re calling zheap. In a zheap, whenever possible, we handle an UPDATE by moving the old row version to an undo log, and putting the new row version in the place previously occupied by the old one. If the transaction aborts, we retrieve the old row version from undo and put it back in the original location; if a concurrent transaction needs to see the old row version, it can find it in undo. Of course, this doesn’t work when the block is full and the row is getting wider, and there are some other problem cases as well, but it covers many useful cases. In the typical case, therefore, even bulk updates do not force a zheap to grow. Instead, the undo grows. When a transaction commits, all row versions that will become dead are in the undo, not the zheap.

不過馬上就會想到問題,如果要改善問題,不是個找地方記錄哪些位置要回收就好了嗎?順便改變方法是為了避免 fragment 嗎?

等著看之後變成什麼樣子吧...

Percona 版本的 MySQL 對於 Meltdown/Spectre 漏洞修復造成的效能損失 (Intel 平台)

而且這還不是完全修復,只是大幅降低被攻擊的機率...

PerconaUbuntu 16.04 上測試 MeltdownSpectre 這兩個安全漏洞的修正對於效能的影響。在原文標題就講了結論,為了修正 Meltdown 與 Spectre 兩個安全漏洞,效能的損失很明顯:「20-30% Performance Hit from the Spectre Bug Fix on Ubuntu」。

這邊測的結果發現,在 CPU bound 時的損失大約是 20%~25% (甚至到 30%),而 I/O bound 會輕一些,大約是 15%~20%:

We can see that in CPU-bound workloads the overhead is 20-25%, reaching up to 30% in point select queries. In IO-bound (25G buffer pool) workloads, the observed overhead is 15-20%.

在 comment 的地方 Percona 的人被問到 AMD 平台上效能會損失多少的問題,但因為他們手上目前沒有 AMD 平台的新機器所以不知道會有多少:

I do not have modern AMD servers on my hands right now

理論上 AMD 平台不需要處理 Meltdown 問題,損失應該會少一些,但沒測過也不曉得會是什麼情況... (像是 Spectre 的修正損失會不會比 Intel 還重,這之類的...)

另外補上早些時候的文章,當時 Ubuntu 上的 kernel 只有對 Meltdown 攻擊的修正,當時 Percona 的人也測了一次:「Does the Meltdown Fix Affect Performance for MySQL on Bare Metal?」,看起來對 Meltdown 攻擊的修正對效能的影響不太大,不過文裡有測試到 syscall 的效率的確如同預期掉很多。

Amazon Aurora (PostgreSQL) 也支援 Read Replica 了

Amazon Aurora (PostgreSQL) 支援 Read Replica 了:「Announcing Amazon Aurora PostgreSQL Read Replica for Amazon RDS for PostgreSQL」。

馬上想到的用途是量爆增時,如果當初有作 R/W split (讀寫分離) 就可以直接用錢撐住,不過官方給的範例是降低 RDS 轉移到 Aurora 的 downtime,這點就有點微妙...:

You can now create an Amazon Aurora PostgreSQL read replica for an Amazon RDS for PostgreSQL instance, allowing you to continuously replicate to Amazon Aurora PostgreSQL. This helps you minimize downtime when migrating a live workload from Amazon RDS for PostgreSQL to Amazon Aurora PostgreSQL, by keeping the instances in sync until you're ready to move your applications and users to Amazon Aurora PostgreSQL.

所以這次算是陸陸續續把功能補上來,在 Amazon Aurora (MySQL) 有的一般性功能,這邊就跟著先實作...

Amazon Aurora (MySQL) 的 Stored Procedure 可以跑 AWS Lambda...

查了資料才發現去年十月 Amazon Aurora (MySQL-Compatible Edition) 就支援用 AWS Lambda 當 stored procedure 了,只是當時只支援 async mode,能做的事情比較有限:「Amazon Aurora New Features: AWS Lambda Integration and Data Load from Amazon S3 to Aurora Tables」。

Now you can invoke Lambda functions directly from within an Aurora database via stored procedures or user-defined functions. Lambda integration allows you to extend the capabilities of the database and invoke external applications to act upon data changes. For example, you can create a Lambda function that sends emails to customers whenever their address in the database is updated.

前幾天發表的則是支援 sync mode,可以等到:「Amazon Aurora with MySQL Compatibility Natively Supports Synchronous Invocation of AWS Lambda Functions」。

Starting with version 1.16, we are extending this feature to be able to able to synchronously invoke Lambda functions.

Use the native function lambda_sync when you must know the result of the execution before moving on to another action.

這解掉了 MySQL 的 stored procedure 一直很殘的問題...

Amazon Aurora 的 MySQL-Compatible Edition 開始提供相容於 MySQL 5.7 的服務

Amazon AuroraMySQL-Compatible Edition 開始支援相容 MySQL 5.7 的服務:「Announcing Preview of Amazon Aurora with MySQL 5.7 Compatibility」。

Amazon Aurora with MySQL 5.7 compatibility offers enhancements such as JSON support, spatial indexes, generated columns and performance improvement of 5x over MySQL 5.7, and up to 10x for spatial datasets.

雖然目前還在 Preview (需要申請才能用),而且開放區域有限,但總算是有消息了:

The preview is currently available in US East (N. Virginia), US East (Ohio), US West (Oregon), and Canada (Montreal).

會一定要 5.7 的,主要的需求應該是在 spatial index 上吧... JSON 反倒還好。

Archives