uptime

Uptime Kuma：UptimeRobot 的 Open Source Clone

UptimeRobot 算是我自己還蠻喜歡用的監控服務，各種常見的類型都支援，像是 HTTP/HTTPS website 的監控，ICMP Ping 或是 TCP 的監控，另外也提供 SSL certificate 的監控。

前陣子找資料的時候找到 Uptime Kuma，這是香港的 Louis Lam 開發的 open source software，光看名字就懷疑是 UptimeRobot 的致敬版本，裝起來用過後就更確定...

差不多用了一個多月了，基本上功能還蠻完整的，我把他接到我自己的 Slack 以及 Pushover 上，就很方便的可以監控服務：

挑一個點：

用 GitHub Actions 做的監控服務 Upptime

是在 Twitter 上看到這個：

Introducing the new phpBB Status Site https://t.co/V20jmQk4Px

— phpBB (@phpbb) May 8, 2022

然後翻到 Upptime 這個 open source monitoring 工具，直接是用 GitHub Actions 提供的 schedule (cron job) 每五分鐘跑一次。這邊要注意的是，如果是 public repository 的話不受限制，如果是 private repository 的話會有機會把 quota 吃完：

Billing note: Upptime uses thousands of build minutes every month (approximately 3,000 minutes in the default setting). If you use a public repository, GitHub offers unlimited free build minutes, but if you use a private repository, you'll have to pay for this time.

依照說明是用 GitHub Actions、GitHub Issues、GitHub Pages 三個功能在運作：

GitHub Actions is used as an uptime monitor

GitHub Issues are used for incident reports

GitHub Pages are used for the status website

除了用這三個功能外，另外還是會每天塞一些資料回 git history 裡面：

We also record the response time once per day and commit it to git history. This way, we can graph long-term trends in your websites' response times by going through git commit history. We generate these graphs once every day, also using schedulers.

好像可以玩看看...

Airtable 的 Open Source 替代方案 Baserow

在 Hacker News Daily 上看到的產品 Baserow，另外在 Hacker News 的討論也有蠻多有趣的東西：「Baserow.io – Self-hosted Airtable alternative (baserow.io)」。

Baserow 的後端是用 Django 寫的，前端則主要是用 Vue.js。

可以看到討論裡 Airtable 的問題，先不講定價問題，只講其他的部份... (因為就如同討論裡面有提到的，有可能是公司故意把定價策略訂成讓不願意花大錢的人使用)

首先是每個月都有「穩定的 downtime」，不過這邊的描述到的數字明顯有問題，一個月當了七個小時也有 99% uptime (兩個九)。

不過先撇開這點，付費服務一般都可以看到 99.9% 的 Monthly SLA (三個九，一個月只允許 43.2 分鐘)，這個情況的確是不太符合業界的標準：

Reliable 1-5 hours a month of downtime. There are months where it's literally 1 nine of uptime. Their status page was actually down for a while (I think they forgot to renew it?)

另外一個有很趣的是很奇耙的設計：

Booleans in the API are `true` or `undefined`. Seriously.

看起來要有蠻多背景知識才有可能知道為什麼這樣搞 XDDD

先丟給 data team 看看，如果他們有興趣的話再找機會架起來...

Braintree (PayPal) 用 PostgreSQL 的方式

RDBMS 最困難的事情都圍繞在「怎麼不中斷服務」(很多事情在不用考慮 uptime/downtime 的前提下很好做，不論是 ALTER 或是 failover，到備份還原計畫)，而 PayPal 的 Braintree 在「PostgreSQL at Scale: Database Schema Changes Without Downtime」這邊討論修改 PostgreSQL 的 database schema 時怎麼不中斷服務。

文章內的大部份都是給 DBA 知道的細節 (e.g. 怎麼樣才不會觸發大規模的 lock 導致服務中斷)，而不是開發者面向的事情... 但開頭的部份，也是我認為最重要的部份，則是需要 Developer 參與的：

For all code and database changes, we require that:

Live code and schemas be forward-compatible with updated code and schemas: this allows us to roll out deploys gradually across a fleet of application servers and database clusters.

New code and schemas be backward-compatible with live code and schemas: this allows us to roll back any change to the previous version in the event of unexpected errors.

為了符合這兩個要素，可能會在 schema 設計上有好幾個階段的操作，而非一次到位。而且也才能避免要關站從 backup 倒資料回來的情況...

建議可以研究看看要怎麼玩，常見的情境知道怎麼設計步驟後，真的遇到的時候會比較熟練。

AWS Lambda 也提供 SLA 了

在「AWS Lambda announces service level agreement」這邊看到 AWS Lambda 提供 99.95% 的 SLA ：

We have published a service level agreement (SLA) for AWS Lambda. We will use commercially reasonable efforts to make Lambda available with a Monthly Uptime Percentage for each AWS region, during any monthly billing cycle, of at least 99.95% (the “Service Commitment”).

不過這種東西都是宣示意味比較重 (至少表示 AWS 認為產品穩定度夠上 SLA)，倒不是希望會用到...