PostgreSQL 的 Job Queue、Application Lock 以及 Pub/Sub

Hacker News Daily 上看到一篇講 PostgreSQL 做 Job Queue、Application Lock 以及 Pub/Sub 的方法:「Do You Really Need Redis? How to Get Away with Just PostgreSQL」,對應的討論在「Do you really need Redis? How to get away with just PostgreSQL (atomicobject.com)」這邊可以翻到。

拿 PostgreSQL 跑這些東西的確有點浪費,不過如果是自己的專案,不想要把 infrastructure 搞的太複雜的話,倒是還不錯。

首先是 Job Queue 的部份,從他的範例看起來他是在做 async job queue (不用等回傳值的),這讓我想到很久前寫的 queue service (應該是 2007 年與 2012 年都寫過一次),不過我是用 MySQL 當作後端,要想辦法降低 InnoDB 的 lock 特性。

async job queue 設計起來其實很多奇怪的眉角,主要就是在怎麼處理失敗的狀態。大多數的需求可以放到兩個種類,最常見用的是 at-least-once,保證最少跑一次,大多數從設計上有設計成 idempotence 的都可以往這類丟,像是報表類的 (重複再跑一次昨天的報表是 OK 的),另外每天更新會員狀態也可以放在這邊。

另外少見一點的是 at-most-once 與 exactly-once,最多只跑一次與只跑一次,通常用在不是 idempotence 的操作上,像是扣款之類的,這邊的機制通常都會跟商業邏輯有關,反正不太好處理...

第二個是 Application Lock,跨機器時的 lock 機制,量沒有很大時拿 PostgreSQL 跑還行,再大就要另外想辦法了,馬上想到的是 ZooKeeper,但近年設計的系統應該更偏向用 etcdConsul 了...

最後提到的 Pub/Sub,一樣是在量大的時候拿 PostgreSQL 跑還行,更大的時候就要拿 Kafka 這種專門為了效能而設計出來的軟體出來用...

GET 與 POST 的差異

看到這篇在講 HTTP (& HTTPS) 裡面 GET 與 POST 的差異,剛好把一些標準的定義拿出來翻一翻,算是複習基本概念:「Get safe」。

第一個基本概念主要是 idempotence (& idempotent),重複被呼叫不會造成狀態的再次改變:

Idempotence ([...]) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application.

數學定義是這樣跑:

An element x of a magma (M, •) is said to be idempotent if:

x • x = x.

If all elements are idempotent with respect to •, then • is called idempotent. The formula ∀x, x • x = x is called the idempotency law for •.

這點在 HTTP 標準 (RFC 7231) 裡面的定義也類似:

A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request. Of the request methods defined by this specification, PUT, DELETE, and safe request methods are idempotent.

第二個基本概念是 Safe method (也是在同樣的 RFC 裡被提到),主要的思想是 read-only,這也是文章作者的標題要講的事情:

Request methods are considered "safe" if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource. Likewise, reasonable use of a safe method is not expected to cause any harm, loss of property, or unusual burden on the origin server.

然後標準的 HTTP method 是有定義的:

   +---------+------+------------+---------------+
   | Method  | Safe | Idempotent | Reference     |
   +---------+------+------------+---------------+
   | CONNECT | no   | no         | Section 4.3.6 |
   | DELETE  | no   | yes        | Section 4.3.5 |
   | GET     | yes  | yes        | Section 4.3.1 |
   | HEAD    | yes  | yes        | Section 4.3.2 |
   | OPTIONS | yes  | yes        | Section 4.3.7 |
   | POST    | no   | no         | Section 4.3.3 |
   | PUT     | no   | yes        | Section 4.3.4 |
   | TRACE   | yes  | yes        | Section 4.3.8 |
   +---------+------+------------+---------------+

不過文章裡面提到的第一個例子並沒有很好,POST 不保證 safe 沒錯,但不代表 safe operation 就不能用 POST。

這邊用 URI resource 的概念 (以及 SEO?) 或是用 Post/Redirect/Get 的概念來說明會比較好:

<form method="get" action="/search">
<input type="search" name="term">

不過文章後續提到的問題的確就是我自己都會犯錯的問題:

“Log out” links that should be forms with a “log out” button—you can always style it to look like a link if you want.

“Unsubscribe” links in emails that immediately trigger the action of unsubscribing instead of going to a form where the POST method does the unsubscribing. I realise that this turns unsubscribing into a two-step process, which is a bit annoying from a usability point of view, but a destructive action should never be baked into a GET request.

這兩個動作都會造成 server 端的狀態改變,不應該用 GET,而我自己常常忘記第一個... 這邊其實可以用 form 產生 POST 需求,並且用 css 效果包起來,達到看起來跟一般的連結一樣。

寫起來讓自己多一點印象,之後避免再犯一樣的錯誤...

Idempotent Bash Script

看到「How to write idempotent Bash scripts」這篇,重點在講 Idempotence,這個詞是從數學上借來的,講重複操作的不動性:

An element x of a magma (M, •) is said to be idempotent if:

x • x = x.

If all elements are idempotent with respect to •, then • is called idempotent. The formula ∀x, x • x = x is called the idempotency law for •.

在 CS 領域也是一樣的概念:

Idempotence (UK: /ˌɪdɛmˈpoʊtəns/, US: /ˌaɪdəm-/) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application.

而這篇講的是 Bash 上有些常見的行為要怎麼改成 Idempotence:

It happens a lot, you write a bash script and half way it exits due an error. You fix the error in your system and run the script again. But half of the steps in your scripts fail immediately because they were already applied to your system. To build resilient systems you need to write software that is idempotent.

一個常見的例子是 cron job 是否可以重複執行的問題。

如果 cron job 裡的程式都是 idempotent,那麼就不需要擔心重跑會因為前一隻 script 產生的環境而失敗,導致無限循環而需要人介入...

另外一個更進階的是同時有兩個 process 在執行同一個 script (可能在不同機器上),這也是要考慮的問題,不過這個問題在大多數情況下有各種 lock 系統可以協助避免,應該不是太大的問題...