Cloudflare 推出 Cloudflare R2 Storage,相容於 S3 API,但沒有傳輸費用

Cloudflare 宣佈了 Cloudflare R2 Storage,相容於 S3 API,但是沒有傳輸費用:「Announcing Cloudflare R2 Storage: Rapid and Reliable Object Storage, minus the egress fees」,Hacker News 上的「Cloudflare R2 storage: Rapid and reliable object storage, minus the egress fees (」可以看一下討論,裡面有負責 R2 的 PM (帳號是 greg-m) 回答一些東西。

R2 的第一個特點就是剛剛提到的傳輸費用:一般的雲端都是傳進去不用錢,但傳出來會很貴,而 R2 其中一個主打的點就是傳出來不用錢:

R2 builds on Cloudflare’s commitment to the Bandwidth Alliance, providing zero-cost egress for stored objects — no matter your request rate. Egress bandwidth is often the largest charge for developers utilizing object storage and is also the hardest charge to predict. Eliminating it is a huge win for open-access to data stored in the cloud.

另外 storage cost 也算低,S3 目前的費用是 US$0.023/GB/month (拿 us-east-1 相比),而 R2 目前的定價是 US$0.015/GB/month:

That doesn’t mean we are shifting bandwidth costs elsewhere. Cloudflare R2 will be priced at $0.015 per GB of data stored per month — significantly cheaper than major incumbent providers.

在 durability 的部份,與 S3 都是一年 11 個 9:

The core of what makes Object Storage great is reliability — we designed R2 for data durability and resilience at its core. R2 will provide 99.999999999% (eleven 9’s) of annual durability, which describes the likelihood of data loss.


R2 is currently under development — you can sign up here to join the waitlist for access.

有幾個點還蠻有趣的,第一個是 Cloudflare 自己在推的 Bandwidth Alliance 裡有不少 VPS 跟 Cloudflare 之間的流量是不計頻寬費用的,所以等於是 VPS 到 R2 不計費,而 R2 到 VPS 也不計費,但要注意 VPS 自己也都有在推 object storage。

像是 Vultr 的 US$5 方案包括了 250GB 的空間與 1TB 的頻寬,扣掉頻寬的部份 (可以透過 Cloudflare 處理),相當於是 US$0.02/GB。

Linode 也類似,US$5 的方案包括了 250GB 的空間與 500GB 的頻寬,算出來也是 US$0.02/GB。

Backblaze 也有類似的產品 B2,US$0.005/GB/month 的儲存費用以及 $0.01/GB 的傳輸費用,但頻寬的部份也可以透過 Cloudflare 處理。

這個產品出來以後可以再看看如何,但看起來是蠻有趣的。對目前的雲端商應該還好 (因為資料進 R2 還是有費用),但對這些 VPS 來說應該是有蠻大的衝擊...

Percona 連載到 PostgreSQL 存 JSON object 以及增加 Index 的方式了...

先前 Percona 的人在講 MySQL 存 JSON object 的方式,現在開始講在 PostgreSQL 裡存 JSON object,並且增加 index 的方式了:「Storing and Using JSON Within PostgreSQL Part One」。

這基本上就是不想用 MongoDB,但還是有需要極為彈性而選擇用 JSON object 的需求。

首先先先建立一個表格,這邊直接用 JSONB:


接著拿「A dataset of English plaintext jokes」這邊的 reddit_jokes.json 來玩,我先把 JSON 裡面的內容變成 JSON Lines 格式:

cat reddit_jokes.json | jq -c '.[]' > reddit_jokes.jsonl

然後 COPY 了十次,多一點資料,後面可以看效能:

alice=# COPY table1 (jb) FROM '/tmp/reddit_jokes.jsonl' CSV QUOTE e'\x01' DELIMITER e'\x02';
-- (repeat this command 10 times)

接著跑個 SELECT 看看速度,我跑了幾次大約都在 260ms 上下:

alice=# SELECT COUNT(*) FROM table1 WHERE (jb->>'score')::int = 10;
(1 row)

Time: 264.023 ms

然後針對 score 生個數字的 index:

alice=# CREATE INDEX ON table1 (((jb->>'score')::int));
Time: 1218.503 ms (00:01.219)

接著再跑 SELECT 下去,可以看到速度快超多:

alice=# SELECT COUNT(*) FROM table1 WHERE (jb->>'score')::int = 10;
(1 row)

Time: 12.735 ms

另外也可以加 column:

alice=# ALTER TABLE table1 ADD COLUMN score INT GENERATED ALWAYS AS ((jb->>'score')::int) STORED;


alice=# SELECT COUNT(*) FROM table1 WHERE score = 10;
(1 row)

Time: 222.163 ms

幫他補 index:

alice=# CREATE INDEX ON table1 (score);

速度有變快,但不知道為什麼沒有 JSONB 的版本快:

alice=# SELECT COUNT(*) FROM table1 WHERE score = 10;
(1 row)

Time: 81.346 ms

算是還蠻好用的,不過得學 JSON query 語法... (應該是還好)

Amazon S3 變成 Strong Consistency 背後的改善方式

看到 Hacker News 上的討論「Diving Deep on S3 Consistency (」才想到該整理一下,原文的「Diving Deep on S3 Consistency」是 Amazon 的 CTO Werner Vogels 花了一些篇幅描述 Amazon S3 怎麼把 Eventually Consistent 變成 Strongly Consistent,當初 Amazon S3 公告時我也有寫一篇文章提到:「Amazon S3 現在變成 Strong Read-After-Write Consistency 啦...」。

Amazon S3 之所以會是 Eventually Consisient 是因為 Metadata Subsystem 的 cache 設計:

Per-object metadata is stored within a discrete S3 subsystem. This system is on the data path for GET, PUT, and DELETE requests, and is responsible for handling LIST and HEAD requests. At the core of this system is a persistence tier that stores metadata. Our persistence tier uses a caching technology that is designed to be highly resilient. S3 requests should still succeed even if infrastructure supporting the cache becomes impaired. This meant that, on rare occasions, writes might flow through one part of cache infrastructure while reads end up querying another. This was the primary source of S3’s eventual consistency.

如果要解決 Eventually Consistent,最直接的想法是拔掉 cache,但這樣對效能的影響太大,所以得在要保留 cache 的情況下設計,所以就想到用其他管道確保 cache 裡的資料狀態是正確的:

One early consideration for delivering strong consistency was to bypass our caching infrastructure and send requests directly to the persistence layer. But this wouldn’t meet our bar for no tradeoffs on performance. We needed to keep the cache. To keep values properly synchronized across cores, CPUs implement cache coherence protocols. And that’s what we needed here: a cache coherence protocol for our metadata caches that allowed strong consistency for all requests.

而接下來是設計一連串的邏輯確保每個 S3 object 的操作都有 serializability:

We had introduced new replication logic into our persistence tier that acts as a building block for our at-least-once event notification delivery system and our Replication Time Control feature. This new replication logic allows us to reason about the “order of operations” per-object in S3. This is the core piece of our cache coherency protocol.

後面又要確保這個 cache coherence 的 HA,最後要能夠驗證實做上的正確性,花的力氣比實做協定本身還多:

These verification techniques were a lot of work. They were more work, in fact, than the actual implementation itself. But we put this rigor into the design and implementation of S3’s strong consistency because that is what our customers need.

Amazon S3 算是 AWS 當初推出來的招牌,當時的 Amazon S3 底層的論文「Amazon's Dynamo」劇烈影響了後來整個產業 (雖然論文裡面是拿 Amazon 的購物車說明),這次的補充算是更新了原來論文的技術,告訴大家本來的 Eventually Consistent 是可以再拉到 Strongly Consistent。

RFC 定義的 application/problem+json (或是 xml)

剛剛在 Clubhouse 上聽到保哥提到了 RFC 7807 這個東西 (Problem Details for HTTP APIs),剛剛翻瀏覽器累積的 tab,發現原來先前有看到,而且有打算要出新版的消息:

RFC 7807 裡面這樣定義的方式可以讓 client 端直接判斷 Content-Type 知道這個回傳資料是不是錯誤訊息,不然以前都是 JSON 就得再另外包裝。用 Content-Type 的作法可以讓判斷條件變得清晰不少。

除了 application/problem+jsonapplication/problem+xml 以外,在「3.1. Members of a Problem Details Object」裡面則是說明 JSON (或是 XML) 裡面有哪些必要以及可選的資訊要填,然後「3.2. Extension Members」這邊則大概描述一下怎麼擴充。


Cloudflare 的另外一個策略:不熱門的資料只放到記憶體內

前陣子的文章,Cloudflare 將不熱門的資料放到記憶體內,不寫到磁碟裡面:「Why We Started Putting Unpopular Assets in Memory」。

主要的原因是這些不熱門的資料常常是一次性的,寫到 SSD 裡面反而浪費 SSD 的生命。而且這樣做因為減少了寫入,反而可以讓 SSD 的讀取變快:

The result: disk writes per second were reduced by roughly half and corresponding disk hit tail latency was reduced by approximately five percent.


Anyway,這個想法不只在 CDN 這邊可以用到,對於有 memory + storage 架構的 cache system 也可以套用類似的道理,而要怎麼決定哪些 object 要寫到磁碟裡面的演算法就是重點了...

題外話,剛剛因為突然想到,瞄了一下 Squid,發現連 HTTPS 都還沒上...

JSON Canonicalization

這篇是講 JSON object 上的簽名,但實際上就是在討論 JSON Canonicalization 的前因後果:「How (not) to sign a JSON object」。

在處理 JSON 資料時,「判斷兩個 JSON object 是否相同」是一個不怎麼簡單的問題,其中一個想法是找一個機制可以把意義相同的 JSON object 都轉成相同的 (byte)string representative,這也就是 JSON Canonicalization。當你可以確保意義相同的 JSON Canonicalization 後,你就可以對 string 本身簽名。

這件事情其實在 XML 就有過同樣的歷史故事 (yeah,總是有人愛在某種資料格式上面疊上簽名),也就是「XML Signature」這個方式。

在 XML 這邊不幸的是,還不少標準選用 XML Signature,像是當年為了實做 Google Apps (現在叫做 G Suite) 的 SSO,而需要接 SAML...

回到原來的 JSON Canonicalization,可以馬上想到的變化包括了空白與 object 裡 key 的順序,也就是這兩個:

  "b": 2,
  "a": 1

但不幸的是,還有 Unicode 來一起亂,也就是下面這個跟上面有相同的意思:

  "\u0062": 2,
  "\u0061": 1

另外還有其他的地雷是平常不會想到的,如果你因為複雜而決定用 library 來做,那也代表 library 必須面對這些複雜的情境,未必沒有 bug...

所以文章作者在最後面才會請大家不要再來亂了 XDDD

Maybe you don’t need request signing? A bearer token header is fine, or HMAC(k, timestamp) if you’re feeling fancy, or mTLS if you really care.

Canonicalization is fiendishly difficult.

Add a signature on the outside of the request body, make sure the request body is complete, and don’t worry about “signing what is said versus what is meant” – it’s OK to sign the exact byte sequence.

Amazon S3 淘汰 Path-style 存取方式的新計畫

先前在「Amazon S3 要拿掉 Path-style 存取方式」提到 Amazon S3 淘汰 Path-style 存取方式的計畫,經過幾天後有改變了。

Jeff Barr 發表了一篇「Amazon S3 Path Deprecation Plan – The Rest of the Story」,裡面提到本來的計畫是 Path-style model 只支援到 2020/09/30,被大幅修改為只有在 2020/09/30 後建立的 bucket 才會禁止使用 Path-style:

In response to feedback on the original deprecation plan that we announced last week, we are making an important change. Here’s the executive summary:

Original Plan – Support for the path-style model ends on September 30, 2020.

Revised Plan – Support for the path-style model continues for buckets created on or before September 30, 2020. Buckets created after that date must be referenced using the virtual-hosted model.

這樣大幅降低本來會預期的衝擊,但 S3 團隊希望償還的技術債又得繼續下去了... 也許再過個幾年後才會再被提出來?

AWS 推出了 S3 Object Lock,保護資料被刪除的可能性

AWS 推出了 S3 Object Lock,可以設定條件鎖住 S3 上的 object,以保護資料不被刪除:「AWS Announces Amazon S3 Object Lock in all AWS Regions」。


You can migrate workloads from existing write-once-read-many (WORM) systems into Amazon S3, and configure S3 Object Lock at the object- and bucket-levels to prevent object version deletions prior to pre-defined Retain Until Dates or Legal Hold Dates.

AWS 提供有兩種模式,一個是 Governance mode,這個模式下可以設定某些 IAM 權限可以移除 S3 Object Lock。另外一個是 Compliance mode,這個模式下連 root account 都不能刪除:

S3 Object Lock can be configured in one of two modes. When deployed in Governance mode, AWS accounts with specific IAM permissions are able to remove object locks from objects. If you require stronger immutability to comply with regulations, you can use Compliance Mode. In Compliance Mode, the protection cannot be removed by any user, including the root account.

gron:把 JSON 結構轉成條列式的資料,方便後續的文字處理...

在「gron makes JSON more greppable」這邊看到 gron 這個工具,可以將 JSON 轉成條列式的資料 (或是反過來,將條列式的資料轉回 JSON)。


▶ gron testdata/two.json 
json = {}; = {}; = ""; = "@TomNomNom";
json.github = "";
json.likes = [];
json.likes[0] = "code";
json.likes[1] = "cheese";
json.likes[2] = "meat"; = "Tom";

這讓 grep 或是 sed 之類的工具會更好操作,不然得用 jq 盧半天...

JSON 的 Object 裡 Key 重複的問題

tl;dr:不要亂來啦... 這是 UB (Undefined behavior) 的一種。

因為看到這則 tweet,所以去查一下 JSON 的資料:

首先是找標準是什麼。在維基百科的 JSON 條目裡提到了有兩份標準,一份是 RFC 7159,一份是 ECMA-404

Douglas Crockford originally specified the JSON format in the early 2000s; two competing standards, RFC 7159 and ECMA-404, defined it in 2013. The ECMA standard describes only the allowed syntax, whereas the RFC covers some security and interoperability considerations.

ECMA-404 裡面就真的只講語法沒講其他東西,而在 RFC 7159 內的 Object 則是有提到 (重點我就用粗體標起來了):

An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique.

   object = begin-object [ member *( value-separator member ) ]

   member = string name-separator value

An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.

JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.

粗體有描述唯一性,但尷尬的地方在於他用 SHOULD 而非 MUST,所以 library 理論上都要能接受。但後面提到如果不唯一時,行為無法預測 (會到 rm -rf / 嗎?XDDD 最像的應該還是 crash?),所以還是不要亂來啦...

不過如果真的會 crash 的話,應該也會因為 DoS issue 而被發 CVE,所以實務上應該是不會 crash 啦...