Home » Posts tagged "cassandra"

從 Cassandra 到 ScyllaDB 的轉移方式好像跟以前不太一樣了...

在「New Docs: Four Phases to Migrate from Apache Cassandra to Scylla」這邊看到 ScyllaDB 官方提供 Cassandra 轉移到 ScyllaDB 的說明,跟以前好像差蠻多的...

以前 ScyllaDB 可以直接加入到 Cassandra 的 cluster (一時間沒找到資料,但在「can not add node with cassandra ami · Issue #107 · scylladb/scylla-cluster-tests」可以看到當時的痕跡),現在給的方法是在資料庫不相容時的轉移方式 (像是從 MySQL 轉換到 PostgreSQL 這種),是暗示已經沒辦法這樣做了嗎?

不過從 GitHub 上的 wiki page 看起來,底層資料與 protocol 應該還是相容的,才能做直接複製資料的 offline migration:「Migrating Cassandra data to Scylla」。

也有可能這篇只是寫手隨意寫的文章,沒有把 ScyllaDB 的優勢展現出來...

About John Hammink
John Hammink is a writer and content creator at ScyllaDB. With more than 20 years in technology, he's also a touring/studio musician, digital artist and speaker.

ScyllaDB 2.0 要引入 Cassandra 3.0+ 的 Materialized View

最近 ScyllaDB 的網站改版了... (有種不習慣的感覺)

ScyllaDB 2.0 打算要引入 Materialized View (出自 Apache Cassandra 3.0+):「Materialized Views preview in Scylla 2.0」。

一般 Materialized View 的實做方式是另外存一份,所以你可以在上面加 Index 之類的設定讓存取速度變快...

不過 Cassandra 不是本來就以讀慢寫快為優勢嗎,要速度可以考慮用 cache 疊出來,或是其他方式,當初 Cassandra 會開發這個功能就有點... XDDD

Apache Foundation 宣佈禁止使用 Facebook BSD+Patents 的軟體

在「RocksDB Integrations」這邊討論到 RocksDBFacebook 所使用的 Facebook BSD+Patents License。

不過因為 RocksDB 最近在換 license (從 Facebook BSD+Patents 換到 Apache License, Version 2.0),移除了 PATENTS 內的限制,需要看 PATENTS 的舊檔案可以在 PATENTS 這邊看到。

Chris Mattmann 正式發出決議禁用 Facebook BSD+Patents License。(參考最後)

另外也提到了 Facebook 是故意埋下這些限制:

Note also Roy's comment that he has discussed the matter with FB's counsel and the word is that the FB license is intentionally incompatible. It is hard to make the argument that it is compatible after hearing that. Pragmatically speaking, regardless of any semantic shaving being done, having a statement like that from the source of the license is very daunting. If they think it is incompatible, we need to not try to wheedle and convince ourselves it is not.

這個 license 之後應該會有更多挑戰...


As some of you may know, recently the Facebook BSD+patents license has been
moved to Category X (https://www.apache.org/legal/resolved#category-x).
Please see LEGAL-303 [1] for a discussion of this. The license is also referred
to as the ROCKSDB license, even though Facebook BSD+patents is its more
industry standard name.

This has impacted some projects, to date based on LEGAL-303
and the detective work of Todd Lipcon:

Samza, Flink, Marmotta, Kafka and Bahir

(perhaps more)

Please take notice of the following policy:

o No new project, sub-project or codebase, which has not
  used Facebook BSD+patents licensed jars (or similar), are allowed to use
  them. In other words, if you haven't been using them, you
  aren't allowed to start. It is Cat-X.

o If you have been using it, and have done so in a *release*,
  you have a temporary exclusion from the Cat-X classification thru
  August 31, 2017. At that point in time, ANY and ALL usage
  of these Facebook BSD+patents licensed artifacts are DISALLOWED. You must
  either find a suitably licensed replacement, or do without.
  There will be NO exceptions.

o Any situation not covered by the above is an implicit
  DISALLOWAL of usage.

Also please note that in the 2nd situation (where a temporary
exclusion has been granted), you MUST ensure that NOTICE explicitly
notifies the end-user that a Facebook BSD+patents licensed artifact exists. They
may not be aware of it up to now, and that MUST be addressed.

If there are any questions, please ask on the legal-discuss@a.o


Chris Mattmann
VP Legal Affairs

[1] https://issues.apache.org/jira/browse/LEGAL-303

Reddit 在處理 Page View 的方式

Reddit 說明了他們如何處理 pageview:「View Counting at Reddit」。

以 Reddit 的規模有提到兩個重點,第一個在善用 RedisHyperLogLog 這個資料結構,當量大的時候其實可以允許有微小的誤差:

The amount of memory varies per implementation, but in the case of this implementation, we could count over 1 million IDs using just 12 kilobytes of space, which would be 0.15% of the original space usage!

維基百科上有說明當資料量在 109 這個等級時,用 1.5KB 的記憶體只有 2% 的誤差值:

The HyperLogLog algorithm is able to estimate cardinalities of > 109 with a typical error rate of 2%, using 1.5 kB of memory.

第二個則是寫入允許短時間的誤差 (pageview 不會即時反應),透過批次處理降低對 Cassandra cluster 的負荷:

Writes to Cassandra are batched in 10-second groups per post in order to avoid overloading the cluster.

可以注意到把 Redis 當作 cache 層而非 storage 層。

主要原因應該跟 Redis 定位是 data structure server 而非 data structure storage 有關 (可以從對 Durability 的作法看出來),而使用 Cassandra 存 key-value 非常容易 scale,但讀取很慢。剛好兩個相輔相成。

ScyllaDB 1.7 支援 Counters 了

在「Scylla release: version 1.7」這邊看到 ScyllaDB 支援 Counters 的消息了 (雖然剛出來,掛著 Experimental 的消息):

Scylla now supports Counters as a native type. A counter column is a column whose value is a 64-bit signed integer and on which two operations are supported: incrementing and decrementing.

這其實是 Cassandra 其中一個強項,針對 counter 這種應用特化的資料型態。

ScyllaDB 1.7 要支援 Cassandra 的 Counter 了

Counter 是 Cassandra 裡一個特別的資料型態,總算是要 porting 進 ScyllaDB 了:「Scylla 1.7 introduces experimental support for counters」。


  • once deleted, counter column value cannot be used again—this is a consequence of the fact that counters can only be incremented or decremented, they cannot be set to any specific value
  • a table cannot contain both counter and regular columns—without this limitation it wouldn’t be possible to provide proper atomicity and isolation guarantees for updates that modify both counters and regular columns in a single row
  • counter columns cannot be members of primary key
  • updates are not idempotent – in case of a write failure the client cannot safely retry the request
    counter columns cannot have time-to-live set

跟 Cassandra 一樣是透過 CRDT 實做的,行為上會是 eventually consistency:

Scylla implements counters using state-based conflict-free replicated data types (CRDT), which allow atomic operations, like increment or decrement, to be performed locally without the need for synchronization between nodes.

在「Scylla Status and Roadmap」這邊可以看到其他會在 1.7 出現的功能。

Scylla 1.4 系列的發佈

ScyllaDB 最近發行了 Scylla 1.4 與 1.4.1:「Scylla release: version 1.4」與「Scylla release: version 1.4.1」,另外也整理出 Docker 版本:「Scylla on Docker」。

可以從 1.4 的公告裡面看到功能愈來愈完整了,在導入其他跟 Cassandra 配合的軟體應該會愈來愈順,而且就之前用 Presto 而去 Scylla 的 GitHub 上回報問題的經驗,Scylla 的人對於能夠生出可重製的 bug report 還蠻重視的,解決速度都還算合理...

另外提供 Docker image 也讓想要測試的人變方便...

Scylla 1.3

看到 Scylla 正式公告 1.3 版的消息了:「Scylla release: version 1.3」。

Scylla 是用 C++ 重寫 Java 版本的 Cassandra 所有東西 (包括資料結構與 Protocol),目標是做到可以完全相容替換現有 Cassandra Cluster。(號稱可以一台一台移除 Cassandra 的程式,裝上 Scylla 後就可以無痛換過去)

而 Scylla 另外一個重點是效能的提昇,官方宣稱在完整最佳化的情況下是 10x 以上的效能提昇,之前拿 AWS 實測 (沒有完整最佳化) 也可以看到 2x 到 4x 的數字,對於目前的 Cassandra 應用來說極為重要。

1.3 版最重要的功能就是對 Thrift 的支援:

Thrift support. Many Cassandra users are still using Thrift, and they can now continue doing so while benefiting from Scylla’s performance. Built on top of Scylla CQL internal implementation, Scylla Thrift provides similar throughput and latency to Scylla CQL. Users of projects like KairosDB and Titan can now migrate to Scylla while maintaining full protocol compatibility .

本來在 roadmap 上的計畫是用兩個版本支援 Thrift:(從 Google Cache 拉出來的,CSS 看起來有些問題,不過意思有到就好)

剛剛發現 1.4 的 roadmap 已經沒有列 Thrift 了:

這應該是暗示已經實作完了?透過 Thrift 界面跟 Cassandra 溝通的應用程式都可以使用 Scylla 了...

先前在「Facebook Presto · Issue #1139 · scylladb/scylla」這邊跟 ScyllaDB 的人花了不少時間,總算是給出一份 data set 可以讓他們重製 bug,也算是有代價了 XD

Netflix 開發的 Delayed Queue

原來這個叫做 Delayed Queue,難怪之前用其他關鍵字都找不到什麼資料... (就不講其他關鍵字了 XD)

Netflix 發表了他們自己所開發的 Delayed Queue:「Distributed delay queues based on Dynomite」。

本來的架構是用 Cassandra + Zookeeper 來做:

Traditionally, we have been using a Cassandra based queue recipe along with Zookeeper for distributed locks, since Cassandra is the de facto storage engine at Netflix.

但可以馬上想到不少問題,就如同 Netflix 提到的:

Using Cassandra for queue like data structure is a known anti-pattern, also using a global lock on queue while polling, limits the amount of concurrency on the consumer side as the lock ensures only one consumer can poll from the queue at a time.

所以就改放到 Netflix 另外開發的 Dynamite 上:

Dynomite, inspired by Dynamo whitepaper, is a thin, distributed dynamo layer for different storage engines and protocols. Currently these include Redis and Memcached. Dynomite supports multi-datacenter replication and is designed for high availability.

後端是 RedisMemcached 的系統,可以對抗整個機房從 internet 上消失的狀態。

在設計上則是「保證會跑一次」,也就是有可能會有多次的情況,用 Dyno Queues 系統的人必需要考慮進去:

4. At-least-once delivery semantics

雖然整篇講的頗輕鬆,但實際看起來還是很厚重... 暫時還是不會用吧 :o