跨雲端的 Zero Downtime 轉移

看到「Ask HN: Have you ever switched cloud?」這個討論,在講雲端之間的搬遷,其中 vidarh 的回答可以翻一下...

首先是他提到原因的部份,基本上都是因為錢的關係,從雲搬到另外一個雲,然後再搬到 Dedicated Hosting 上:

Yes. I once did zero downtime migration first from AWS to Google, then from Google to Hetzner for a client. Mostly for cost reasons: they had a lot of free credits, and moved to Hetzner when they ran out.

Their savings from using the credits were at least 20x what the migrations cost.

然後他也直接把整理的資料丟出來,首先是在兩端上都先建立 load balancer 類的服務:

* Set up haproxy, nginx or similar as reverse proxy and carefully decide if you can handle retries on failed queries. If you want true zero-downtime migration there's a challenge here in making sure you have a setup that lets you add and remove backends transparently. There are many ways of doing this of various complexity. I've tended to favour using dynamic dns updates for this; in this specific instance we used Hashicorp's Consul to keep dns updated w/services. I've also used ngx_mruby for instances where I needed more complex backend selection (allows writing Ruby code to execute within nginx)

再來是打通內網,其實就是 site-to-site VPN:

* Set up a VPN (or more depending on your networking setup) between the locations so that the reverse proxy can reach backends in both/all locations, and so that the backends can reach databases both places.

然後建立資料庫的 replication server 以及相關的機制:

* Replicate the database to the new location.

* Ensure your app has a mechanism for determining which database to use as the master. Just as for the reverse proxy we used Consul to select. All backends would switch on promoting a replica to master.

* Ensure you have a fast method to promote a database replica to a master. You don't want to be in a situation of having to fiddle with this. We had fully automated scripts to do the failover.

然後是確認 application 端可以切換自如:

* Ensure your app gracefully handles database failure of whatever it thinks the current master is. This is the trickiest bit in some cases, as you either need to make sure updates are idempotent, or you need to make sure updates during the switchover either reliably fail or reliably succeed. In the case I mentioned we were able to safely retry requests, but in many cases it'll be safer to just punt on true zero downtime migration assuming your setup can handle promotion of the new master fast enough (in our case the promotion of the new Postgres master took literally a couple of seconds, during which any failing updates would just translate to some page loads being slow as they retried, but if we hadn't been able to retry it'd have meant a few seconds downtime).

然後確認新的雲端有足夠的 capacity 撐住流量後,就是要轉移了,首先是降低 DNS TTL:

Once you have the new environment running and capable of handling requests (but using the database in the old environment):

* Reduce DNS record TTL.

然後把舊的 load balancer 指到新的後端,這時候如果發現問題可以快速 rollback 回來:

* Ensure the new backends are added to the reverse proxy. You should start seeing requests flow through the new backends and can verify error rates aren't increasing. This should be quick to undo if you see errors.

接著把 DNS 指到新的 load balancer,理論上應該不會有太大問題:

* Update DNS to add the new environment reverse proxy. You should start seeing requests hit the new reverse proxy, and some of it should flow through the new backends. Wait to see if any issues.

接著把資料庫切到新的機房,有問題時可以趕快切回去再確認哪邊有狀況:

* Promote the replica in the new location to master and verify everything still works. Ensure whatever replication you need from the new master works. You should now see all database requests hitting the new master.

最後的階段就是拔掉舊的架構:

* Drain connections from the old backends (remove them from the pool, but leave them running until they're not handling any requests). You should now have all traffic past the reverse proxy going via the new environment.

* Update DNS to remove the old environment reverse proxy. Wait for all traffic to stop hitting the old reverse proxy.

* When you're confident everything is fine, you can disable the old environment and bring DNS TTL back up.

其實這個方法跟雲端沒什麼關係,以前搞機房搬遷的時候應該都會規劃過類似的方案,大方向也都類似 (把 stateful services 與 stateless services 拆開來分析),只是不像雲端的彈性租賃,硬體要準備比較多...

我記得當年 Instagram 搬進 Facebook 機房的時候也有類似的計畫,之前有提過:「Instagram 從 AWS 搬到 Facebook 機房」。

台灣最近的話,好像是 PChome 24h 有把機房搬到 GCP 上面?看看他們之後會不會到 GCP 的場子上發表他們搬遷的過程...

Cloudflare 預定推出 SSH command logging 功能

Cloudflare 預定要推出 SSH command logging 功能:「Introducing SSH command logging」。

看起來是 Cloudflare Zero Trust 產品線:

We’re excited to announce SSH command logging as part of Cloudflare Zero Trust.

翻了 Zero Trust 的 Pricing 資訊,Free Plan 看起來是 50 users 以下不用錢,但不確定這個功能會被放到 Core Features 內還是會被放到另外要收費的部份。

早在 AWS 在 2016 年的時候有用 script 整合了一個方案出來,裡面提到的方法不限於 AWS 上才能用:「How to Record SSH Sessions Established Through a Bastion Host」。

翻了一下 open source 專案,看起來有一些 open source 方案可以用:

另外這個需求在資安要求比較高的行業算是很常見,非 open source 的方案就有不少,先隨便抓兩個:

美國聯邦政府推動的 Zero Trust 架構

看到美國總統行政辦公室發佈的「Moving the U.S. Government Toward Zero Trust Cybersecurity Principles」這個備忘錄,在講 Zero trust security model,算是讓其他聯邦單位可以依循的指引,從比較高的角度來說明聯邦政府對系統安全設計的方向。

裡面有提到「Phishing-resistant MFA」,一般的 MFA 無法防止 phishing (像是軟體 TOTP 類的 Google Authenticator 或是硬體式 TOTP 的 RSA SecurID,或是透過簡訊輸入收到的字串那種),要能夠對抗 phishing 的應該只有 U2F 或是後續的 WebAuthn 這種有把網站位置也放進 protocol 的協定。

另外提到了 RBACABAC 兩種設計,而且更偏好用 ABAC 得到更多彈性:

Currently, many authorization models in the Federal Government focus on role-based access control (RBAC), which relies on static pre-defined roles that are assigned to users and determine their permissions within an organization. A zero trust architecture should incorporate more granularly and dynamically defined permissions, as attribute-based access control (ABAC) is designed to do.

另外因為 zero trust 的設計,內部網路其實只能當作是一個傳輸媒介,不能當作是一個安全的傳輸層,任何的傳輸都需要有另外的驗證機制確保 CIA,所以從 DNS 的流量必須是透過 DNS over HTTPS 或是 DNS over TLS 的保護:

Agencies must resolve DNS queries using encrypted DNS wherever it is technically supported. This means that agency DNS resolvers must support standard encrypted DNS protocols (DNS-over-HTTPS or DNS-over-TLS), and must use them to communicate with upstream DNS resolvers.

任何 HTTP 傳輸都需要使用 HTTPS 保護,甚至是把 .gov 直接放進 HTTPS-only 清單 (應該是指 HSTS preload?):

More generally, the .gov top-level domain has announced an intent to eventually preload the entirety of the .gov domain space as an HTTPS-only zone.

不過裡面也有提到 email 的 encryption 到目前為止沒有好的方法可以確保 encryption 的使用,尤其是跟外部的人溝通:

Unlike HTTP and DNS, there is not today a clear path forward for guaranteeing that Federal emails are encrypted in transit, particularly for emails with external parties.

然後提到安全漏洞的測試與回報機制也蠻有趣的,像是鼓勵外部測試:

In addition to their own testing programs, agencies must increase their reliance on external perspectives to identify vulnerabilities that internal staff may not identify

以及鼓勵安全回報的制度:

Public vulnerability disclosure programs, which allow security researchers and other members of the general public to report security issues safely, are used widely across the Federal Government and many private-sector industries. These programs are an invaluable accompaniment to existing internal security programs and operate as a reality check on an organization’s online security posture.

拿來翻一翻讀一讀...

0.x 版的 ZeroVer

Hacker News Daily 上看到「ZeroVer: 0-Based Versioning (0ver.org)」這個討論,原網站在「ZeroVer: 0-based Versioning」這邊。

這應該是從 Semantic Versioning 出現 (看起來是 2009 年?) 而且被廣泛應用後的現象,就是大家都不想負責 XDDD

在 Semantic Versioning 裡面有提到 0.x 版的特殊性,也就是什麼都不保證:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

因為 Semantic Versioning 被廣泛應用後,這個特性也深受開發者喜愛 (即使軟體變得很成熟),於是 ZeroVer 這個站台就整理了一堆仍在使用 0.x 的軟體...

其實我覺得這樣很好,像是 HashiCorp 如果決定把軟體推出 1.0 版 (最近應該是 Terraform 的 1.0),表示他們對這個產品有一定的信心了,不然都還在摸索開發階段...

0.x 算是讓開發者可以很有信心的跟大家講「我沒信心」的方式 XD

Windows 上現成的 KataGo + Lizzie 安裝程式

本來花了一些功夫整理了一下 KataGo + Lizzie 純 CPU 版本在 Windows 上的安裝方式 (在「KataGo/Windows」這邊),後來找了一下發現早就有人做的更簡單了:「BadukMegapack」。

底層的部份除了可以選擇裝 KataGo 外,也可以選擇目前已經停止維護的 Leela Zero,或是 SAIAQRay

而界面的部份除了可以用 Lizzie 外 (而且還是改過的加強版 XD),還可以選其他的界面... 除此之外還連 JVM (Java 8) 都一起拉進來幫你裝。

看起來算是包的好好的... 看起來最困難的應該是弄一張高階顯卡了?

Dropbox 也要搞自己的密碼管理器

Dropbox 也要搞自己的密碼管理器 Dropbox Passwords:「Dropbox Passwords coming soon for all users」。

看起來只要是 Dropbox 的付費方案就可以無限使用,而免費版的則是 50 組。從下載頁看起來目前在 PC 上只支援 Microsoft WindowsmacOS,不支援 Linux

Come back to this page on a PC with Windows 10 or a Mac with at least macOS Sierra 10.12 to get the Passwords desktop app.

而行動平台就是 iOSAndroid

How do I use the Android and iPhone password manager?

Once you sign in to the Passwords app, it automatically fills in your usernames and passwords so you can access frequently used apps and websites on your mobile device.

從示意圖看起來有整合瀏覽器,而加密技術的部份沒有講太多,只說是 zero-knowledge encryption,先觀望看看...

Vault 裡 AppRole 的設計,以及怎麼解決 Secret Zero 問題

VaultHashiCorp 拿來管理各種敏感資訊的軟體,像是 API token 或是資料庫的帳號密碼。把這些資訊集中控管後就不需要把這些資訊放進 Git (超爽的?),而是在需要的時候由應用程式呼叫 Vault 取得。

而 Vault 的設計裡面要求應用程式需要「認證」後才能取得,結果這個「認證」又是一組敏感資訊,這就是 Secret Zero 問題,屬於雞蛋問題的一種。

找了一輪發現 HashiCorp 自家的說明解釋的最清楚,不過這篇是放在 blog 上的文章:「Tackling the Vault Secret Zero Problem by AppRole Authentication」。

Vault 在解決 Secret Zero 的方法是使用 AppRole,這邊的認證包括了 role_idsecret_id 的設計。比較特別的是一組 role_id 可以有多組 secret_id 對應。

在 AppRole 這樣的設計下,權限會綁在 role_id 上,而 secret_id 則可以在部屬時動態產生,像是官方提到的 TerraformChef,或是依照組織裡面使用的管理工具:

這樣就可以透過 role_id 管理權限,但不用在 Git 裡面寫死 Secret Zero 資訊,而且每台機器都有自己的 secret_id 可以提供稽核記錄,把幾個比較明顯的問題解了不少...

不停機把 server 搬到兩百米外的機房

Hacker News Daily 上看到的有趣故事,作者在 Reddit 上描述怎麼不停機把實體的伺服器搬到兩百米外的機房,中間還經過了停車場:「[Rant... sorta] Physically moved a server today...」,另外作者的 FAQ 在「[FAQ][Rant... sorta] Physically moved a server today...」這邊可以看到。

我會把這個當故事看一看就好,裡面還是有一些細節沒有被敘述 (像是網路不斷線的部份),感覺不太對,但就一個故事來看是蠻有趣的 XD

搬運的過程中間包括了使用 UPS 與多顆 switch 對接,另外中間經過一個停車場,算是很有趣的方式?

在 EC2 上面跑 Lizzie + KataGo

我用 Packer 包了一個將 Lizzie (界面) + KataGo (引擎) 打包成 Amazon EC2 AMI 的設定:「packer-katago」。

這個組合應該是目前圍棋棋手最常拿來分析棋譜的工具,與之前常用的 Lizzie + Leela Zero 的差異在於 KataGo 可以分析勝率與目數,而 Leela Zero 則只能分析勝率。(參考之前寫的「用更少訓練時間的 KataGo」這篇)

早期的 KataGo 強度還沒有很強,一般還是會與 Leela Zero 交叉換著分析,但最近幾個版本的強度比之前好很多,目前看起來已經超過 Leela Zero 了 (可以參考 CGOS Whole Period Ratings for 19x19 Board 這邊列出來的排名),另外就 YouTube 上看起來,蠻多棋手應該都是改用 KataGo 了...

不過不管是 Leela Zero 還是 KataGo 都需要夠強的 GPU 運算,之前就算用 GTX 1080 Ti 也還是覺得不夠快,就丟到 AWS 上面用看看,順便練一下手,熟悉 Packer 怎麼用。

我是設計成用 IceWM + VNC,連進去後左下的選單裡面就會有 Lizzie 可以選:

第一次跑起來會比較久,我在 p3.2xlarge 的機器上大約要等個三四分鐘,然後就會出現數字了:

看了一下運算的速度還不錯,用 spot instance 開的話,成本上應該還可以接受 (剛剛的 p3.2xlarge 是 USD$0.918/hr)。

用更少訓練時間的 KataGo

最近開始在不同的地方會看到 KataGo 這個名字 (TwitterYouTube 上都有看到),翻了一下資料發現是在訓練成本上有重大突破,依照論文的宣稱快了五十倍...

在第一次跑的時候,只用了 35 張 V100 跑七天就有 Leela Zero 第 130 代的強度:

The first serious run of KataGo ran for 7 days in Februrary 2019 on up to 35xV100 GPUs. This is the run featured in the paper. It achieved close to LZ130 strength before it was halted, or up to just barely superhuman.

而第二次跑的時候用了 28 張 V100 跑 20 blocks 的訓練,跑了 19 天就已經超越 Facebook 當初提供的 ELFv2 版本,而對應到 Leela Zero 大約是第 200 代左右的強度 (要注意的是在 Leela Zero 這邊已經是用 40 blocks 的結構訓練了一陣子了):

Following some further improvements and much-improved hyperparameters, KataGo performed a second serious run in May-June a max of 28xV100 GPUs, surpassing the February run after just three and a half days. The run was halted after 19 days, with the final 20-block networks reaching a final strength slightly stronger than LZ-ELFv2! (This is Facebook's very strong 20-block ELF network, running on Leela Zero's search architecture). Comparing to the yet larger Leela Zero 40-block networks, KataGo's network falls somewhere around LZ200 at visit parity, despite only itself being 20 blocks.

從論文裡面可以看到,跟 Leela Zero 一樣是逐步提昇 (應該也是用 Net2Net),而不是一開始就拉到 20x256:

In KataGo’s main 19-day run, (b, c) began at (6, 96) and switched to (10, 128), (15, 192), and (20, 256), at roughly 0.75 days, 1.75 days, and 7.5 days, respectively. The final size approximately matches that of AlphaZero and ELF.

訓練速度上會有這麼大的改善,分成兩個類型,一種是一般性的 (在「Major General Improvements」這章),另外一類是特定於圍棋領域的改進 (在「Major Domain-Specific Improvements」這章)。

在 Leela Zero 的 issue tracking 裡面也可以看到很多關於 KataGo 的消息,看起來作者也在裡面一起討論,應該會有一些結果出來...