Home » Posts tagged "nosql"

Cloudflare 的 Workers KV

Cloudflare 推出了 Workers KV 服務:「Building With Workers KV, a Fast Distributed Key-Value Store」。

是個 key-value 結構服務 (全球性,eventually consistent,約 10 秒的同步時間),key 的限制是 2KB,value 是 64KB,一個 namespace 最多 10 億筆資料。

讀取可以到 100k+ read/s,但寫入是 1 write/s/key,可以看出來主要是為了讀取資料而設計的。

現有的 worker ($5/month) 會送一些量,包括 1GB 的空間與一千萬次的讀取:

Your $5 monthly Workers compute minimum includes 1 GB of KV storage and up to 10 million KV reads. If you use less than the 10 million included Worker requests now, you can use KV without paying a single cent more.

超過的部份另外再收:

Beyond the minimums, Workers KV is billed at $0.50 per GB-month of additional storage and $0.50 per million additional KV reads.

這樣好像整個 blog 的基本功能都可以直接在上面跑了... 而搜尋靠外部服務,圖片與影音也可以靠外部空間協助?

Amazon DynamoDB 提供 Docker Image 讓開發者可以在本地端測試

AWS 推出了 Amazon DynamoDB 的相容 Docker Image,讓開發者可以在本地端測試 DynamoDB 的 API:「Use Amazon DynamoDB Local More Easily with the New Docker Image」,在 amazon/dynamodb-local 這邊可以拉到,裡面其實是包 Java:

DynamoDB local is now available to download as a self-contained Docker image or a .jar file that can run on Microsoft Windows, Linux, macOS, and other platforms that support Java.

這樣在 Continuous Integration (CI) 的過程裡面也可以拉起 service 測試...

Amazon DynamoDB 的 Point-In-Time Recovery

Amazon DynamoDB 在 3/26 發出來的功能,以秒為單位的備份與還原機制:「New – Amazon DynamoDB Continuous Backups and Point-In-Time Recovery (PITR)」。

先打開這個功能:

打開後就會開始記錄,最多可以還原 35 天內的任何一個時間點的資料:

DynamoDB can back up your data with per-second granularity and restore to any single second from the time PITR was enabled up to the prior 35 days.

這時候就算改變資料或是刪除資料,實際上在系統內都是 Copy-on-write 操作,所以需要另外的空間,這部份會另外計價:

Pricing for continuous backups is detailed on the DynamoDB Pricing Pages. Pricing varies by region and is based on the current size of the table and indexes. For example, in US East (N. Virginia) you pay $0.20 per GB based on the size of the data and all local secondary indexes.

有這樣的功能通常是一開始設計時就有考慮 (讓底層的資料結構可以很方便的達成這樣的效果),現在只是把功能實作出來... 像 MySQL 之類的軟體就沒辦法弄成這樣 XDDD

最後有提到支援的地區,是用條列的而不是說所有有 Amazon DynamoDB 的區域都支援:

PITR is available in the US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Canada (Central), EU (Frankfurt), EU (Ireland), EU (London), and South America (Sao Paulo) Regions starting today.

比對一下,應該是巴黎與美國政府用的區域沒進去... 一個是去年年底開幕的區域,另一個是本來上新功能就偏慢的區域。

DynamoDB 可以透過 KMS 加密了...

AWSDynamoDB 可以透過 KMS 加密了:「New – Encryption at Rest for DynamoDB」。

You simply enable encryption when you create a new table and DynamoDB takes care of the rest. Your data (tables, local secondary indexes, and global secondary indexes) will be encrypted using AES-256 and a service-default AWS Key Management Service (KMS) key.

看起來不是自己的 KMS key,而是 service 本身提供的,這樣看起來是在 i/o level 加密,所以還不是 searchable encryption 的能力...

Amazon DynamoDB 跨區 Replication 以及備份

Amazon DynamoDB 實做了全球性的 replication,以及備份功能:「Amazon DynamoDB Update – Global Tables and On-Demand Backup」。

跨區 replication 的功能讓每個 region 都可以存取當地機房的 DynamoDB:

Global Tables – You can now create tables that are automatically replicated across two or more AWS Regions, with full support for multi-master writes, with a couple of clicks. This gives you the ability to build fast, massively scaled applications for a global user base without having to manage the replication process.

這有點類似 GoogleCloud Spanner 在前陣子也推出全球性服務,但 DynamoDB 提供的比較偏向 NoSQL 而不是 RDBMS。

另外一個限制是跨區同步是 async,會有 replication lag 的問題:

Updates are propagated to other Regions asynchronously via DynamoDB Streams and are typically complete within one second (you can track this using the new ReplicationLatency and PendingReplicationCount metrics).

不過如果是這樣的機制,conflict 的問題不知道怎麼解決... 文章裡面沒看到。

然後目前支援的區域還是有限:

Global Tables are available in the US East (Ohio), US East (N. Virginia), US West (Oregon), EU (Ireland), and EU (Frankfurt) Regions today, with more Regions in the works for 2018.

另外一個是備份與還原機制,有這樣的功能對很多計畫方便不少:

On-Demand Backup – You can now create full backups of your DynamoDB tables with a single click, and with zero impact on performance or availability. Your application remains online and runs at full speed. Backups are suitable for long-term retention and archival, and can help you to comply with regulatory requirements.

而備份還原機制是陸陸續續開放的,區域也有限:

We are rolling this new feature out on an account-by-account basis as quickly as possible, with initial availability in the US East (Northern Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) Regions.

ElastiCache 支援 r4.* 了

每年 AWS re:Invent 要到的時候就會有很多新的消息出來,Amazon ElastiCache 的團隊應該也是配合著這一波放出消息:「Amazon ElastiCache Now Supports the R4 Node Family」。

包括了 memcachedRedis 都支援了:

Amazon ElastiCache now supports R4 node types. R4 nodes are optimized for latency sensitive and memory intensive workloads. They come in six sizes, providing 12.3GiB to 407GiB of available in-memory capacity. By setting up a 15-shard cluster for Redis, you can scale up to 6.1TiB of in-memory capacity. For Memcached, you can set up a 20-node cluster to support up to 8.14 TiB in-memory workloads. Equipped with the Intel Broadwell processor, and improved networking, R4 node family offers superior performance over the popular R3 node family.

這兩個應用都是看記憶體吃飯的...

從 Cassandra 到 ScyllaDB 的轉移方式好像跟以前不太一樣了...

在「New Docs: Four Phases to Migrate from Apache Cassandra to Scylla」這邊看到 ScyllaDB 官方提供 Cassandra 轉移到 ScyllaDB 的說明,跟以前好像差蠻多的...

以前 ScyllaDB 可以直接加入到 Cassandra 的 cluster (一時間沒找到資料,但在「can not add node with cassandra ami · Issue #107 · scylladb/scylla-cluster-tests」可以看到當時的痕跡),現在給的方法是在資料庫不相容時的轉移方式 (像是從 MySQL 轉換到 PostgreSQL 這種),是暗示已經沒辦法這樣做了嗎?

不過從 GitHub 上的 wiki page 看起來,底層資料與 protocol 應該還是相容的,才能做直接複製資料的 offline migration:「Migrating Cassandra data to Scylla」。

也有可能這篇只是寫手隨意寫的文章,沒有把 ScyllaDB 的優勢展現出來...

About John Hammink
John Hammink is a writer and content creator at ScyllaDB. With more than 20 years in technology, he's also a touring/studio musician, digital artist and speaker.

在 MOPCON 2017 的 Unconference「MySQL to NoSQL & Search Engine」

把投影片傳到 Speaker Deck 上了:「MySQL to NoSQL & Search Engine」。

這是在介紹 noplay/python-mysql-replication 這個軟體,我在示範時用的 python script 有增加 blocking 參數讓他保持一直讀取 MySQL replication stream:

from pymysqlreplication import BinLogStreamReader

mysql_settings = {'host': '127.0.0.1', 'port': 3306, 'user': 'root', 'passwd': ''}

stream = BinLogStreamReader(connection_settings = mysql_settings, server_id=100, blocking=True)

for binlogevent in stream:
    binlogevent.dump()

stream.close()

利用這樣的工具可以做很多事情,像是當 post 表格更新時自動更新 search engine,並且清空 memcached 內的資料。這可以避免使用 library 時有可能會漏掉忘記做 (因為有些程式不用 library 處理),可靠度比較高。

另外一方面 replication protocol 本身就有考慮重連的問題,重新接上時是可以從上一次處理完的資料繼續處理 (只要不要隔太久),這讓寫應用的人不需要用太複雜的方式確保他不會漏掉。

用 Go 寫的 Badger

Dgraph 在推銷自家發展出來的 Badger:「Introducing Badger: A fast key-value store written natively in Go」。

標靶是 RocksDB,號稱比 RocksDB 快好幾倍:

Based on benchmarks, Badger is at least 3.5x faster than RocksDB when doing random reads. For value sizes between 128B to 16KB, data loading is 0.86x - 14x faster compared to RocksDB, with Badger gaining significant ground as value size increases. On the flip side, Badger is currently slower for range key-value iteration, but that has a lot of room for optimization.

不過我覺得有些重要的功能在 Badger 不提供,這比起來有種橘子比蘋果的感覺... 像是 RocksDB 提供了 Transaction,而 Badger 則是直接講明他們不打算支援 Transaction:

Keep it simple, stupid. No support for transactions, versioning or snapshots -- anything that can be done outside of the store should be done outside.

Archives