Cloudflare Zaraz 的新價錢

Cloudflare Zaraz 有點像是 GA4Mixpanel 或是 Amplitude 這種產品:

Cloudflare Zaraz is a solution that developers and marketers use to load third-party tools like Google Analytics 4, Facebook CAPI, TikTok, and others.

剛剛看到 Cloudflare Zaraz 的宣佈的新價錢:「Zaraz launches new pricing」,翻資料的時候發現去年七月的時候也有宣佈過價錢,但當時看起來反彈頗大而暫緩:「Cloudflare Zaraz steps up: general availability and new pricing」。

先看一下去年七月當時宣佈的價錢,這邊用了 Zaraz Loads 這個特別的 term:

If you exceed the free Zaraz Loads allocations, you'll be charged $0.50 for every additional 1,000 Zaraz Loads, but the service will continue to function.

看起來 Zaraz Load 包括了 script loading 以及 pageview event 都各算一次:

A Zaraz Load is counted each time a web page loads the Zaraz script within it, and/or the Pageview trigger is being activated.

這樣的話 1M event 就要 US$500,隔壁棚 Mixpanel 可以在「Pricing - Mixpanel | Product Analytics」這邊試拉,同樣 1M event 只要 $140,難怪當時被幹剿到不行...

這次公佈的價錢則是又用了新的 term,叫做 Zaraz Events,這個看起來就是一般理解的 event:

One of the biggest changes we made was changing the metric we used for pricing Zaraz. One Zaraz Event is an event you’re sending to Zaraz, whether that’s a pageview, a zaraz.track event, or similar.

而價錢直接下殺到 1M event 收 US$5,跟之前的差異巨大:

With the new Zaraz pricing model, every Cloudflare account gets 1,000,000 Zaraz Events per month for free. If your account needs more than that, every additional 1,000,000 Zaraz Events are only $5 USD, with volume discounting available for Enterprise accounts.

只是不知道 Zaraz 可以做到什麼程度,「理論上」會比較陽春,但不知道夠不夠用?

Hacker News 目前搬上 Cloudflare

Hacker News 前陣子似乎是因為被打而轉移到 Cloudflare 上:

;; ANSWER SECTION:
news.ycombinator.com.   1       IN      CNAME   news.ycombinator.com.cdn.cloudflare.net.
news.ycombinator.com.cdn.cloudflare.net. 300 IN A 172.67.5.232
news.ycombinator.com.cdn.cloudflare.net. 300 IN A 104.22.6.236
news.ycombinator.com.cdn.cloudflare.net. 300 IN A 104.22.7.236

另外可以參考「Site report for http://news.ycombinator.com」這邊的記錄,我另外備份一份放在 archive.today 上:「Site report for http://news.ycombinator.com」。

可以看到先前是指到 209.216.230.240,然後今年 (2024) 年初的時候似乎是因為 DDoS 的關係,在骨幹上被 blackhole 掉,後來就轉到 Cloudflare 上了。

一個比較意外的是看到報告說是 FreeBSD,等之後切回去再來研究看看?

Cloudflare 的 WAF 在科技類的網站容易誤判,像是「Ask HN: Does Cloudflare block HN comments if you have code blocks in a reply?」這邊就遇到了,可以在留言裡面看到大家在研究要怎麼繞 WAF XDDD

等到切回去應該就會恢復,不過還在 Cloudflare 上的這陣子應該會繼續看到抱怨...

Cloudflare 前幾天 API 與 Dashboard 出事的 Post Mortem 記錄

前幾天 Cloudflare 的 API 與 dashboard 掛了一天多,少見的讓 Matthew Prince (CEO) 自己出來發 post mortem 記錄了:「Post Mortem on Cloudflare Control Plane and Analytics Outage」,在 Hacker News 上面也有蠻多討論的:「Post Mortem on Cloudflare Control Plane and Analytics Outage (cloudflare.com)」,這邊是整理我自己讀完後的感想。

從「Matthew Prince - The Cloudflare Blog」這邊可以看出來 Matthew Prince 上次是 2023/09/27 的公關文「Cloudflare’s 2023 Annual Founders’ Letter」,還有對應的多國翻譯,像是繁體中文的「Cloudflare 2023 年度創始人來信」,再往前的 2022/12/11 也是公關文「Welcome to Cloudflare’s Impact Week」(以及對應的繁體中文版本:「歡迎來到 Cloudflare 的 Impact Week」)。

這次的事情算是 Cloudflare 在 post-IPO 後很少見的長時間出事,就難得看到 Matthew Prince 自己出來坦了。為了重新建立信任,加上因為層級的關係,可以看到透漏出很多架構細節,算是這次可以窺視 Cloudflare 架構的一些資訊。

先大概提一下官方文章的著墨點:他們花了非常多的篇幅在機房服務商 Flexential 在處理 PDX-04 (這是 Cloudflare 訂的名稱) 機房電力問題的失職 (要注意這邊是 Cloudflare 的觀點,認為 Flexential 的失職,目前沒有從 Flexential 這邊的消息出來解釋),淡化掉了 Cloudflare 自己的設計問題,這邊在 Hacker News 上有蠻多人都有指出來的。

這次事件一切的起因是 Flexential 的 PDX-04 機房整個電力系統斷線 offline 導致的,屬於標準的 data center failure 的情況,像是 2013 年二月時是方機房的火災 (可以參考 iThome 的整理),或是 2021 年 OVH 的機房火災 (「去年 OVH 機房大火的部份情形最近被揭露」),是個在設計架構時一定會規劃進去的項目。

所以 Matthew Prince 先是解釋 Cloudflare 的 HA 作法,是直接在 Hillsboro, Oregon 租三個機房建立起 low-latency network:

Cloudflare's control plane and analytics systems run primarily on servers in three data centers around Hillsboro, Oregon. The three data centers are independent of one another, each have multiple utility power feeds, and each have multiple redundant and independent network connections.

但大家看到這計馬上就會去查,這個城市也才 66.9km2 的土地,大約是 1/4 個台北市 (約 291.8km2) 再小一些,拉了一下城市內的直線最遠距離,大約是 12km?

呃,這不是一個地震 (就在聖安地列斯斷層區域?) 或是一個核攻擊就把 Cloudflare 最核心的部分給擺平了嗎?

其中 analytics systems 就算了:整個 Hillsboro 掛了進入 gracefully degrading,我可以理解這個設計的考量,但 control plane 看起來不太妙?

雞蛋放在同一個籃子的問題裡先放著,雖然這個問題真的很...。

後續提到了有些重要的產品對沒有 HA 能力的服務上有相依性:

Unfortunately, we discovered that a subset of services that were supposed to be on the high availability cluster had dependencies on services exclusively running in PDX-04.

這邊沒有講是哪些服務的相依性,但文章其他地方有提到有些基礎服務是沒有跨機房 HA 架構的,只有在 PDX-04 有跑,包括了 KafkaClickHouse

In particular, two critical services that process logs and power our analytics — Kafka and ClickHouse — were only available in PDX-04 but had services that depended on them that were running in the high availability cluster.

這點讓人頗意外的,Kafka 因為自己架設 & 維護過,知道他的架構本身就很容易設計到跨機房的 case,而且這算是很基礎建設的東西,居然沒有跨機房 HA?

而 ClickHouse 只有研究過,沒有實際把 production 量丟上去跑,但從文件看到的東西,應該至少能做到 shared-everything 的架構,也居然沒有跨機房 HA?

這接基礎建設的問題,導致了雖然只是單一機房 PDX-04 掛掉,但在有重要基礎建設消失的情況下 (應該就是上面提到的 Kafka 與 ClickHouse),加上 Flexential 沒有給出恢復的時間,決定直接跑災難重建的 SOP (也就是 Hillsboro 的三個機房都回不來的情境)。

而這也可以看到恢復時間比較久,從決定切到歐洲的 DR site 到整個切過去花了四個多小時:

Because more services were offline than we expected, and because Flexential could not give us a time for restoration of our services, we made the call at 13:40 UTC to fail over to Cloudflare's disaster recovery sites located in Europe.

By 17:57 UTC, the services that had been successfully moved to the disaster recovery site were stable and most customers were no longer directly impacted.

因為 Kafka 與 ClickHouse 在 Hillsboro 只有單一機房有服務,那就不確定歐洲 DR site 平常有沒有建起來,也許這邊的四個多小時有不少是在歐洲 DR site 把 Kafka 與 ClickHouse 建起來?(這個就只能猜測了)

回到 Flexential 這邊,在恢復供電的過程發現 Cloudflare 這邊迴路用的 breaker 掛了,直到十個小時後才供電,但也因為大家都忙了一整天,Matthew Prince 決定讓大家先回去休息,隔天早上再從歐洲的 DR site 切回 Hillsboro,也因此拉長了恢復的時間:

At 12:48 UTC, Flexential was able to get the generators restarted. [...] When Flexential attempted to power back up Cloudflare's circuits, the circuit breakers were discovered to be faulty.

Flexential replaced our failed circuit breakers, restored both utility feeds, and confirmed clean power at 22:48 UTC. Our team was all-hands-on-deck and had worked all day on the emergency, so I made the call that most of us should get some rest and start the move back to PDX-04 in the morning. That decision delayed our full recovery, but I believe made it less likely that we’d compound this situation with additional mistakes.

要注意報告慣例是用 UTC 時間 (這又是另外一個主題了,先前 HN 上也有其他文章討論過...),而 Hillsboro 在美西,要減八個小時,所以 22:48 UTC 是下午兩點多左右,美東與歐洲的團隊時間則會更晚。

目前看起來 Cloudflare 的設計與流程有很大的改善空間?之後看看有沒有其他的八卦消息出來?

Cloudflare Workers 推出新的計費模式:以 CPU time 收費

CloudflareWorkers 這個產品 (serverless 產品線) 推出了以 CPU time 收費的模式:「New Workers pricing — never pay to wait on I/O again」。

在這之前大家都是以 wall time 在計費,但這對於會卡在 I/O 很久的應用來說很不利,這次 Cloudflare 提出方案改用 CPU time 來計費,的確有吸引到我的目光...

這是舊的:

這是新的:

就這兩張比較起來有個不是很確定的部分,現在不看 memory 用量收費了?Pricing 這頁裡面的資量已經把 「Standard」更新上去了,但好像還是沒提到 memory?

不過不是馬上生效,而是這個月的月底 2023/10/31 可以選擇切到新的方案:

Starting October 31, 2023, you will have the option to opt in individual Workers and Pages Functions projects on your account to new pricing, and newly created projects will default to new pricing. You’ll be able to estimate how much new pricing will cost in the Cloudflare dashboard. For the majority of current applications, new pricing is the same or less expensive than the previous Bundled and Unbound pricing plans.

另外明年 2024/03/01 會全部強制切到新的方案:

If you’re on our Workers Paid plan, you will have until March 1, 2024 to switch to the new pricing on your own, after which all of your projects will be automatically migrated to new pricing. If you’re an Enterprise customer, any contract renewals after March 1, 2024, will use the new pricing.

用 CPU time 的確是好不少,但不知道這個 billing 的方式沒有其他地雷...

Cloudflare 正式推出 ECH (Encrypted Client Hello)

Cloudflare 宣佈所有的 Plan 都支援 ECH 了:「Encrypted Client Hello - the last puzzle piece to privacy」。

要講 ECH 之前,會先提到 Domain fronting 這個避開網路審查 (censorship) 的技巧,這是利用 CDN 服務裡對 SNI 的處理與對 HTTP header 的處理分開的特性,算是一種 side effect 的搞法,不保證會動,但蠻多 CDN 的服務都會動。

實際上的作法就是在 SNI 層的 hostname 填寫一個沒有被審查的 Hostname (像是 www.akamai.com),然後連到 www.akamai.com:443 上,也都使用 www.akamai.com 的 SSL certificate 交換。

但在 HTTP header 上面則是送出不一樣的 hostname (但也必須是在這家 CDN 的機器上有提供服務的網域),通常是被管制的網域名稱,像是 Host: www.cnbc.com 這樣的 HTTP header;而因為這部份被 TLS 保護,無法從外部看到內容而可以穿過網路審查。

理論上 CDN 服務是可以 reject 這樣的連線的,但大多數的 CDN 廠商並沒有阻擋這麼嚴格,於是算是一種堪用的方式。而這次 ECH 算是把這個行為標準化了,就不需要再用 side effect 的方式了。

Cloudflare 本身就很大,可以來看看後續的效應...

Cloudflare 可以針對不同 Hostname 給不同的 TLS 設定了

Cloudflare 總算是提供付費方案 (包在 Advanced Certificate Manager 裡面),可以針對不同的 hostname 給不同的 TLS 設定了:「Introducing per hostname TLS settings — security fit to your needs」。

本來的限制是整個 domain 都是一樣的 TLS 設定,這點對免費仔來說還好,但對於企業客戶來說就不太好用了。

遇到客戶端 (甚至是客戶) 是 Java 6 這種不支援 TLS 1.2 的情況 (參考「Qualys SSL Labs - Projects / User Agent Capabilities: Java 6u45」這邊),你還是得想辦法生一組 TLS 1.0 服務出來,但整個 domain 都開又有可能會死在 PCI-DSS 之類的規範。

以前遇到的時候有兩種解法,第一種是在客戶端自己解決,像是在內網架 SSL proxy (通常會搭配 self-signed CA) 讓 Java 6 的 client 還是可以透過 TLS 1.0 通訊,但是連到 internet 上面會是比較新的 TLS 1.2 或是 TLS 1.3,這種算是比較安全的。

另外一種就是在 Cloudflare 上另外開一個 domain,這樣就可以用 TLS 1.0 半裸奔。

現在這樣等於是讓第二個方案更簡單一點,不用另外開 domain,只需要在 hostname 上設定...

Cloudflare 看 HTTP/3 的使用率

Cloudflare 利用自家平台分析過去一年 HTTP/3 的使用率:「Examining HTTP/3 usage one year on」。

首先是整體的使用率,看起來反而沒有什麼太大的變化?一上線就已經是巔峰的感覺?

然後各種 bot (像是搜尋引擎的 bot 或是 social media 的 bot) 看起來幾乎都沒有用 HTTP/3,少數的量應該都是實驗性質居多,唯一的例外是 LinkedIn 有試著在導入,可以看到慢慢的爬升:

看起來不算太順?

關於 twemoji.maxcdn.com 這個網址的一些事情

在「phpBB 3.3.10 Release」這邊看到這個修正,研究了一下發現原來有些故事在後面跑:

Update the emoji CDN: PHPBB3-17071

Twitter Emoji (Twemoji) 這個計畫是 Twitter 弄出來的 open source project,最主要是讓不支援新版 Unicode 的系統上可以改用圖片顯示出來 (畢竟 Unicode 一直在加字)。

其中提供的 url 是 https://twemoji.maxcdn.com/v/latest/twemoji.min.js,可以看到裡面帶有 MaxCDN 的資訊,算是一種廣告,但後來 StackPath 在 2016 併購 MaxCDN,接下來是在去年打算要淘汰掉 MaxCDN 這個產品線:「MaxCDN and SecureCDN are Retiring; Here’s What It Means for You」。

這件事情被帶到「Clarify MaxCDN URLs now that MaxCDN is retiring #556」這邊討論,但看起來沒有太多動作,後來在「Maxcdn has shut down, cdn not working anymore. #580」這邊又被帶起來討論,其中 Twitter 前員工大概提了一下情況,主要是當年他們跟 MaxCDN 有談過讓 MaxCDN 負責頻寬的部份:

@simplexx among all the things, twemoji has always been a Twitter service for the community.

At my times in there, we had a great agreement / deal with MaxCDN so that it's hard to blame the boss this time, as MaxCDN is a completely different company/story.

What I see is some poor attention to this project, as companies don't close from a day to another (usually?) but as we all know what's going on @ twitter, I can't really blame any of my former colleagues, or new arrivals there.

Please let's not make it a wall of shame for all the people that worked on this, thanks for your understanding (I've left 7 years ago or more, as example, I've got pinged by some follower and I'm just trying to help you out anyway).

另外也有人提到目前 Twitter 的溝通管道的狀況:

For what it's worth, while I was still there we were in talks with MaxCDN to have the same deal when they migrated to be Stackpath. Everyone who had worked on the deal with MaxCDN had left and left no record of that deal, so it was taking them a bit longer to work out than expected – MaxCDN used to offer free hosting to OSS projects, but Stackpath wouldn't be doing that. They were working on an exception for us, but any emails they send to our Twitter emails now get bounced, so I guess they could've gotten this sorted out before shutting down MaxCDN... but it's not like we're there on the other end to make it happen, so it appears we'll never know.

Anyway,現在是至少讓 twemoji.maxcdn.com 恢復到「會動」了,但情況變得很特殊,twemoji.maxcdn.com 這個網址目前是指到競業的 BunnyCDN 上:

;; ANSWER SECTION:
twemoji.maxcdn.com.     229     IN      CNAME   twemoji.b-cdn.net.
twemoji.b-cdn.net.      35      IN      A       23.248.177.58

但如果你真的打過去要檔案,會是 301 到 jsDelivr 上:

$ http https://twemoji.maxcdn.com/2/svg/1f525.svg
[...]
Location: https://cdn.jsdelivr.net/npm/twemoji@11.3.0/2/svg/1f525.svg
[...]

而 jsDelivr 目前是放在 Fastly 上:

;; ANSWER SECTION:
cdn.jsdelivr.net.       180     IN      CNAME   jsdelivr.map.fastly.net.
jsdelivr.map.fastly.net. 30     IN      A       199.232.45.229

但在 GitHub 的說明上面則是建議用 UNPKG

<script src="https://unpkg.com/twemoji@latest/dist/twemoji.min.js" crossorigin="anonymous"></script>

而 UNPKG 目前的 CDN 的部份則是 Cloudflare

;; ANSWER SECTION:
unpkg.com.              86400   IN      NS      anirban.ns.cloudflare.com.
unpkg.com.              86400   IN      NS      aron.ns.cloudflare.com.
;; ANSWER SECTION:
unpkg.com.              300     IN      A       104.16.126.175
unpkg.com.              300     IN      A       104.16.123.175
unpkg.com.              300     IN      A       104.16.122.175
unpkg.com.              300     IN      A       104.16.124.175
unpkg.com.              300     IN      A       104.16.125.175

指到 BunnyCDN 但是 BunnyCDN 只負責 redirect,然後也不是導到 UNPKG 上...

Cloudflare 宣佈漲價

Cloudflare 寫了一篇很長的漲價公告:「Adjusting pricing, introducing annual plans, and accelerating innovation」。

25% 的漲幅,本來的 Pro Plan 從 $20/mo 漲到 $25/mo,本來的 Business Plan 從 $200/mo 漲到 $250/mo:

Cloudflare is raising prices for the first time in the last 12 years. Beginning January 15, 2023, new sign-ups will be charged $25 per month for our Pro Plan (up from $20 per month) and $250 per month for our Business Plan (up from $200 per month). Any paying customers who sign up before January 15, 2023, including any currently paying customers who signed up at any point over the last 12 years, will be grandfathered at the old monthly price until May 14, 2023.

然後丟出年費方案,就跟原來的月費方案寄算十二個月一樣的價錢 ($240/y 與 $2400/y):

We are also introducing an option to pay annually, rather than monthly, that we hope most customers will choose to switch to. Annual plans are available today and discounted from the new monthly rate to $240 per year for the Pro Plan (the equivalent of $20 per month, saving $60 per year) and $2,400 per year for the Business Plan (the equivalent of $200 per month, saving $600 per year). In other words, if you choose to pay annually for Cloudflare you can lock in our old monthly prices.

後面就是很長的解釋...