AWS 的 CA 更新 (CA-2019),將會影響與資料庫相關的服務

AWS 要開始使用新的 CA 發各種資料庫相關服務的 SSL endpoint (包括 RDSAuroraDocumentDB),看一下差不多是十多天前的消息,不知道為什麼突然發個緊急通告:「Urgent & Important – Rotate Your Amazon RDS, Aurora, and Amazon DocumentDB (with MongoDB compatibility) Certificates」。

本來的 CA 是 CA-2015,將在 2020 年到期:

March 5, 2020 – The CA-2015 certificates will expire. Applications that use certificate validation but have not been updated will lose connectivity.

所以會有裡面提到的事項要處理,其中前幾天讓新的 CA-2019 變成新機器的預設值,但如果有需要,還是可以選舊的:

January 14, 2020 – Instances created on or after this date will have the new (CA-2019) certificates. You can temporarily revert to the old certificates if necessary.

可以把 https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem 拉下來用 openssl x509 -text -in rds-ca-2019-root.pem 看,另外把網址裡的 2019 改成 2015 也可以拉到舊版的。

不過看起來沒有用 X.509 Name Constraints 限制在只有自家 RDS 的網域...

Amazon RDS 推出了 Connection Pool 的產品

Amazon RDS 推出了 Connection Pool 的產品,叫做 Amazon RDS Proxy:「Introducing Amazon RDS Proxy (Preview)」。

目前支援 MySQL (包括了傳統的與 Aurora 版本的):

Amazon RDS Proxy supports Amazon RDS for MySQL and Amazon Aurora with MySQL compatibility, with support for additional RDS database engines coming soon.

定價策略看起來是依照後端資料庫的 vCPU 計算:

Pricing is simple and predictable: you pay per vCPU of the database instance for which the proxy is enabled.

翻了一下價錢頁是 USD$0.015/vCPU (用 us-east-1 的資料),而如果是 t2 系列的機器,最低是以 2 vCPUs 計算,不是照使用比例算:

RDS Proxy pricing correlates to the number of vCPUs of the database instance for which it is enabled, with a minimum charge for 2 vCPUs.

這樣一個 vCPU 一個月大約要 USD$21.6,算起來頗貴的... 如果 SLA 允許的話,用基本的方式 failover 也許就 ok 了...

如果 SLA 真的要追求到這麼高的話,可以在這些區域測試:

Amazon RDS Proxy is available in preview for RDS MySQL and Aurora MySQL in US East (N. Virginia), US East (Ohio), US West (Oregon), EU West (Ireland), and Asia Pacific (Tokyo) regions. Support for RDS PostgreSQL and Aurora PostgreSQL is coming soon.

Amazon Aurora 可以直接使用 AWS 的 Machine Learning 服務

AWS 宣佈了 Amazon Aurora 可以直接使用 AWS 自家的 Machine Learning 服務:「New for Amazon Aurora – Use Machine Learning Directly From Your Databases」。

整合了兩個服務,分別是 Amazon SageMaker (各類的模型) 以及 Amazon Comprehend (文字處理相關)。

目前只有 Amazon Aurora MySQL 5.7 的版本有支援,其他的還在做:

The new machine learning integration is available today for Aurora MySQL 5.7, with the SageMaker integration generally available and the Comprehend integration in preview. You can learn more in the documentation. We are working on other engines and versions: Aurora MySQL 5.6 and Aurora PostgreSQL 10 and 11 are coming soon.

這個整合讓程式用起來更方便了...

Amazon Aurora MySQL 5.7 也可以上 Global Database 了

AWSAmazon Aurora MySQL 5.7 版本推出了 Amazon Aurora Global Database:「Aurora Global Database is Now Supported on Amazon Aurora MySQL 5.7」。

看起來 MySQL 系的 Global Database 就是跨區的 master-slave 架構 (所以標榜降低了 read latency,但沒有提到 write latency):

An Amazon Aurora Global Database is a single database that spans multiple AWS regions, enabling low latency global reads and disaster recovery from region-wide outages.

另外可以看到是 1 秒,所以應該是 async replication:

Aurora Global Database replicates writes in the primary region with typical latency of <1 second to secondary regions, for low latency global reads.

然後可以跨區切換:

In disaster recovery situations, you can promote the secondary region to take full read-write responsibilities in under a minute.

看了一下好像不用多付服務費用,就是各區自己的費用,加上傳輸的費用而已,看起來是個還不錯的服務?

Amazon 又把一個大部門的 Oracle 系統轉移到了 AWS 自家的系統

算是 AWS 的 PR 稿,在老闆對雲的宣示與政治正確下本來就會陸陸續續轉過去...

這次是 Amazon 的 Consumer Business 從 Oracle 的系統換到 AWS 自己的系統:「Migration Complete – Amazon’s Consumer Business Just Turned off its Final Oracle Database」。

原先有 75 PB 的資料與 7500 個 database:

We migrated 75 petabytes of internal data stored in nearly 7,500 Oracle databases to multiple AWS database services including Amazon DynamoDB, Amazon Aurora, Amazon Relational Database Service (RDS), and Amazon Redshift.

其中一個優點是省成本,但是也投入了超過一百個團隊一起參與轉移,會需要攤多久才會打平,這點在沒有看到內部財務資料其實沒辦法判斷,而且工程資源的稀缺性也是個沒有被看到的資訊:

Cost Reduction – We reduced our database costs by over 60% on top of the heavily discounted rate we negotiated based on our scale. Customers regularly report cost savings of 90% by switching from Oracle to AWS.

More than 100 teams in Amazon’s Consumer business participated in the migration effort.

然後 latency 的下降其實也只能參考,因為轉移系統的時候也會順便改寫,有多少是因為 AWS 服務本身帶出來,在沒有內部資料看不出來:

Performance Improvements – Latency of our consumer-facing applications was reduced by 40%.

管理成本算是裡面唯一可以參考的,畢竟是搬到可延展擴充的服務:

Administrative Overhead – The switch to managed services reduced database admin overhead by 70%.

另外,沒寫的東西比較有趣,像是他們沒有選擇 Athena 而是用 Redshift,看起來像是先轉上去,其他找機會再說...

Amazon Aurora with PostgreSQL 支援 Logical Replication

AWS 先前宣佈 Amazon Aurora (MySQL) 支援 GTID Replication (參考「Amazon Aurora with MySQL 5.7 支援 GTID」),現在則是宣佈 Amazon Aurora with PostgreSQL 支援 Logical Replication:「Amazon Aurora with PostgreSQL Compatibility Supports Logical Replication」。

如同預期的,要新版的才支援:

Logical replication is supported with Aurora PostgreSQL versions 2.2.0 and 2.2.1, compatible with PostgreSQL 10.6.

有 Logical Replication 可以多做很多事情,像是雲端與外部 PostgreSQL 服務的串接 (e.g. 即時拉一份到 IDC 機房)。另外有些 ETL 工具也可以透過這個方式取得資料庫上改變了什麼東西。

Amazon Aurora with MySQL 5.7 支援 GTID

雖然在 AWS 上服務的 HA 大多都不需要自己管理,但備份機制 (甚至異地備援) 還是要自己規劃,Amazon Aurora with MySQL 的 GTID 功能算是讓這塊多了一個選擇:「Amazon Aurora with MySQL 5.7 Compatibility Supports GTID-Based Replication」。

公告裡面有提到 Aurora 自己的 replication 還是用自己的機制,而非透過 GTID 做的:

This provides complete consistency when using binlog replication between an Aurora database and an external MySQL database. Your replication won’t miss transactions or generate conflicts, even after failover or downtime. (Note that replication within an Aurora cluster doesn't use binlog files, so the GTID feature doesn't apply.)

不過就 Aurora 的架構來說,整個 cluster 比較像是看作一個整體,用 binlog + position 應該是夠用的?也不會有 failover 時的 conflict 問題?不確定用 GTID 的好處會在哪邊,還得再想看看...

AWS 推出 MongoDB 服務:Amazon DocumentDB

AWS 推出了 Amazon DocumentDB 服務,相容於 MongoDB 3.4 3.6 的界面:「New – Amazon DocumentDB (with MongoDB Compatibility): Fast, Scalable, and Highly Available」。

這個新聞其實引起不少關注,不單純是 AWS 支援了 MongoDB service,而是 AWS 對去年一系列 license issue 的態度。

先講 license 的事情,後面再提技術上的差異。

背景是 MongoDB 在去年十月的時候決定換 license,決定從 GNU AGPL 換成他們自己定義的 SSPL:「MongoDB now released under the Server Side Public License」。

相關的報導可以參考 TechCrunch 當時寫的「MongoDB switches up its open-source license」,主要的重點在於:

[T]he SSPL explicitly states that anybody who wants to offer MongoDB as a service — or really any other software that uses this license — needs to either get a commercial license or open source the service to give back the community.

而 AWS 在三個月後的回應也意外的清楚,他直接照著 MongoDB 3.6 版的 API 刻一個出來,不需要用你的軟體提供服務 (所以就不用照你的 license 走):

Amazon DocumentDB implements the Apache 2.0 open source MongoDB 3.6 API by emulating the responses that a MongoDB client expects from a MongoDB server, allowing you to use your existing MongoDB drivers and tools with Amazon DocumentDB.

TechCrunch 下的標題也頗直接,認為 AWS 對這套搞法不怎麼認同:「AWS gives open source the middle finger」。

回到技術上的層面來看,可以看到 Amazon DocumentDB 提供的技術資料看起來跟 Amazon Aurora 很像,都是六份三區:

Amazon DocumentDB uses a purpose-built SSD-based storage layer, with 6x replication across 3 separate Availability Zones.

連 read replica 的限制也都是 15 份,可以「猜測」後面應該是用同一套技術在運作...:

In Amazon DocumentDB, the storage and compute are decoupled, allowing each to scale independently, and developers can increase the read capacity to millions of requests per second by adding up to 15 low latency read replicas in minutes, regardless of the size of your data.

看了一下價錢,最小台是 db.r4.large,需要 USD$0.277/hr,相當於一個月要 USD$200 左右,而且 storage 與 i/o 要另外計算,門檻不算低。

目前主要還是歐美區先上:

Amazon DocumentDB (with MongoDB compatibility) is available now and you can start using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions.

隔壁棚的 Redis 不知道有什麼感想...

Amazon Aurora Global Database

AWSAurora (MySQL) 推出 Amazon Aurora Global Database:「Announcing Amazon Aurora Global Database」。

看起來不是 multi-master (從 secondary region 這個字看),所以寫入的部分還是得送回 primary region 處理:

Aurora Global Database uses storage-based replication with typical latency of less than 1 second, using dedicated infrastructure that leaves your database fully available to serve application workloads. In the unlikely event of a regional degradation or outage, one of the secondary regions can be promoted to full read/write capabilities in less than 1 minute.

應該是單一 endpoint 幫你處理這些雜事...

Amazon Aurora 支援 Parallel Query 加速

Amazon Aurora 推出了 Parallel Query,可以加速計算速度:「New – Parallel Query for Amazon Aurora」。原理是利用 Aurora 把 storage 層打散的前提,所以有機會透過螞蟻雄兵處理:

官方給的範例可以連到原文去看,可以看到有打開 aurora_pq 與沒打開的效能差異:

15 rows in set (1 min 53.36 sec)
15 rows in set (1 hour 25 min 51.89 sec)

打開後大約是原來的 1/45 時間,提昇超多...

不過還是有些限制,我最在意的就是目前只支援相容於 MySQL 5.6 的版本 (居然不是先支援 5.7):

Engine Support – We are launching with support for MySQL 5.6, and are working on support for MySQL 5.7 and PostgreSQL.

然後沒有多餘費用,只是 i/o cost 可能會增加:

Cost – You can make use of Parallel Query at no extra charge. However, because it makes direct access to storage, there is a possibility that your IO cost will increase.