原來有專有名詞:TOCTOU (Time-of-check to time-of-use)

看「The trouble with symbolic links」這篇的時候看到的專有名詞:「TOCTOU (Time-of-check to time-of-use)」,直翻是「先檢查再使用」,算是一個常見的 security (hole) pattern,因為檢查完後有可能被其他人改變,接著使用的時候就有可能產生安全漏洞。

在資料庫這類環境下,有 isolation (ACID 裡的 I) 可以確保不會發生這類問題 (需要 REPEATABLE-READ 或是更高的 isolation level)。

但在檔案系統裡面看起來不太順利,2004 年的時候研究出來沒有 portable 的方式可以確保避免 TOCTOU 的問題發生:

In the context of file system TOCTOU race conditions, the fundamental challenge is ensuring that the file system cannot be changed between two system calls. In 2004, an impossibility result was published, showing that there was no portable, deterministic technique for avoiding TOCTOU race conditions.

其中一種 mitigation 是針對 fd 監控:

Since this impossibility result, libraries for tracking file descriptors and ensuring correctness have been proposed by researchers.

然後另外一種方式 (比較治本) 是檔案系統的 API 支援 transaction,但看起來不被主流接受?

An alternative solution proposed in the research community is for UNIX systems to adopt transactions in the file system or the OS kernel. Transactions provide a concurrency control abstraction for the OS, and can be used to prevent TOCTOU races. While no production UNIX kernel has yet adopted transactions, proof-of-concept research prototypes have been developed for Linux, including the Valor file system and the TxOS kernel. Microsoft Windows has added transactions to its NTFS file system, but Microsoft discourages their use, and has indicated that they may be removed in a future version of Windows.

目前看起來的問題是沒有一個讓 Linux community 能接受的 API 設計?

Google Docs 裡 Grammar Correction 的 bug

剛剛在 Hacker News 上看到有趣的 bug,在 Google Docs 上輸入 And. And. And. And. And. 會觸發 error:「Including “And. And. And. And. And.” in a Google doc causes it to crash (support.google.com)」,原始的 bug report 在「Including "And. And. And. And. And." in a Google doc causes it to crash.」這邊,錯誤訊息像是這樣:

Hacker News 上的討論有提到這需要開 grammar check 的功能,然後看起來只要有相同的五個字開頭都大寫就會發生,像是 Also, Therefore, And, Anyway, But, Who, Why. 這些:

Also, Therefore, And, Anyway, But, Who, Why.

Each in caps 5 times with the same word with a period and space after each word and newline at the end is what I have found so far.

Can anyone find others?

Edit: added words that work found in other comments

很有趣的 bug XDDD 然後目前在 Hacker News 首頁的第一名...

Google 提供掃描 JAR 檔內是否有中獎的 Log4j 的工具

Hacker News 首頁上看到 Google 提供了一套用 Golang 寫的工具,可以掃描 JAR 檔裡面是否有中獎的 Log4j:「log4jscanner」,對應的討論在「Log4jscanner (github.com/google)」這邊。

看起來是內部工具,放出來前先把 vcs history 清掉了:

We unfortunately had to squash the history when open sourcing. The following contributors were instrumental in this project's development: [...]

另外討論裡也有人提到「OWASP Dependency-Check」這個工具也可以掃,這套就更一般性了:

Dependency-check automatically updates itself using the NVD Data Feeds hosted by NIST.

Let's Encrypt 在檢查 CAA 時出包

Let's Encrypt 發現在檢查 CAA 的程式碼有問題,發了說明:「2020.02.29 CAA Rechecking Bug」,以及預定的處理方式:「Revoking certain certificates on March 4」。

問題是當一個 certificate request 包含了 N 個 domain 時,本來的 CAA 檢查應該要對這 N 個檢查,但程式寫成只會抓一個,然後檢查了 N 次:

The bug: when a certificate request contained N domain names that needed CAA rechecking, Boulder would pick one domain name and check it N times. What this means in practice is that if a subscriber validated a domain name at time X, and the CAA records for that domain at time X allowed Let’s Encrypt issuance, that subscriber would be able to issue a certificate containing that domain name until X+30 days, even if someone later installed CAA records on that domain name that prohibit issuance by Let’s Encrypt.

2020/02/29 發現的,就程式碼的部屬時間,發現應該從去年 2019/07/25 開始就有這個 bug:

We confirmed the bug at 2020-02-29 03:08 UTC, and halted issuance at 03:10. We deployed a fix at 05:22 UTC and then re-enabled issuance.

Our preliminary investigation suggests the bug was introduced on 2019-07-25. We will conduct a more detailed investigation and provide a postmortem when it is complete.

然後決定要 revoke 這些可能會有問題的 SSL certificate,大約佔現有還有效的 SSL certificate 的 2.6%,大約三百萬筆:

Q: How many certificates are affected?
A: 2.6%. That is 3,048,289 currently-valid certificates are affected, out of ~116 million overall active Let’s Encrypt certificates. Of the affected certificates, about 1 million are duplicates of other affected certificates, in the sense of covering the same set of domain names.

在「Check whether a host's certificate needs replacement」這邊可以偵測線上使用的 SSL certificate 是否受到影響。

另外在「Download affected certificate serials for 2020.02.29 CAA Rechecking Incident」這邊可以抓到所有受到影響,預定要 revoke 的 SSL certificate 的序號。關於取得序號的方式,官方也有提供 CLI 的指令可以操作確認,對於有很多網域名稱需要確認的人可以用這組指令編寫程式判斷:

openssl s_client -connect example.com:443 -servername example.com -showcerts </dev/null 2>/dev/null | openssl x509 -text -noout | grep -A 1 Serial\ Number | tr -d :

照目前的描述,如果申請時只有一個 domain 應該是不會中這個問題,再來是最壞的情況大概會維持三個月 (網站主人沒管他,等到時間到了自動 renew)。

Ptt 與事實查核單位合作了

看到「[公告] 活動部提供板主事實查核申請管道」這個,Ptt 跟三個事實查核單位合作:

1.台灣事實查核中心(Taiwan FactCheck Center)
2.MyGoPen 麥擱騙
3.蘭姆酒吐司(Rumor & Truth)

不知道會有怎麼樣的反應速度與標示方式...

讀書時間:Meltdown 的攻擊方式

Meltdown 的論文可以在「Meltdown (PDF)」這邊看到。這個漏洞在 Intel 的 CPU 上影響最大,而在 AMD 是不受影響的。其他平台有零星的消息,不過不像 Intel 是這十五年來所有的 CPU 都中獎... (從 Pentium 4 以及之後的所有 CPU)

Meltdown 是基於這些前提,而達到記憶體任意位置的 memory dump:

  • 支援 µOP 方式的 out-of-order execution 以及當失敗時的 rollback 機制。
  • 因為 cache 機制造成的 side channel information leak。
  • 在 out-of-order execution 時對記憶體存取的 permission check 失效。

out-of-order execution 在大學時的計算機組織應該都會提到,不過我印象中當時只講「在確認不相干的指令才會有 out-of-order」。而現代 CPU 做的更深入,包括了兩個部份:

  • 第一個是 µOP 方式,將每個 assembly 拆成更細的 micro-operation,後面的 out-of-order execution 是對 µOP 做。
  • 第二個是可以先執行下去,如果發現搞錯了再 rollback。

像是下面的 access() 理論上不應該被執行到,但現代的 out-of-order execution 會讓 CPU 有機會先跑後面的指令,最後發現不該被執行到後,再將 register 與 memory 的資料 rollback 回來:

而 Meltdown 把後面不應該執行到 code 放上這段程式碼 (這是 Intel syntax assembly):

其中 mov al, byte [rcx] 應該要做記憶體檢查,確認使用者是否有權限存取那個位置。但這邊因為連記憶體檢查也拆成 µOP 平行跑,而產生 race condition:

Meltdown is some form of race condition between the fetch of a memory address and the corresponding permission check for this address.

而這導致後面這段不該被執行到的程式碼會先讀到資料放進 al register 裡。然後再去存取某個記憶體位置造成某塊記憶體位置被讀到 cache 裡。

造成 cache 內的資料改變後,就可以透過 FLUSH+RELOAD 技巧 (side channel) 而得知這段程式碼讀了哪一塊資料 (參考之前寫的「Meltdown 與 Spectre 都有用到的 FLUSH+RELOAD」),於是就能夠推出 al 的值...

而 Meltdown 在 mov al, byte [rcx] 這邊之所以可以成立,另外一個需要突破的地方是 [rcx]。這邊 [rcx] 存取時就算沒有權限檢查,在 virtual address 轉成 physical address 時應該會遇到問題?

原因是 LinuxOS X 上有 direct-physical map 的機制,會把整塊 physical memory 對應到 virtual memory 的固定位置上,這些位置不會再發給 user space 使用,所以是通的:

On Linux and OS X, this is done via a direct-physical map, i.e., the entire physical memory is directly mapped to a pre-defined virtual address (cf. Figure 2).

而在 Windows 上則是比較複雜,但大部分的 physical memory 都有對應到 kernel address space,而每個 process 裡面也都還是有完整的 kernel address space (只是受到權限控制),所以 Meltdown 的攻擊仍然有效:

Instead of a direct-physical map, Windows maintains a multiple so-called paged pools, non-paged pools, and the system cache. These pools are virtual memory regions in the kernel address space mapping physical pages to virtual addresses which are either required to remain in the memory (non-paged pool) or can be removed from the memory because a copy is already stored on the disk (paged pool). The system cache further contains mappings of all file-backed pages. Combined, these memory pools will typically map a large fraction of the physical memory into the kernel address space of every process.

這也是 workaround patch「Kernel page-table isolation」的原理 (看名字大概就知道方向了),藉由將 kernel 與 user 的區塊拆開來打掉 Meltdown 的攻擊途徑。

而 AMD 的硬體則是因為 mov al, byte [rcx] 這邊權限的檢查並沒有放進 out-of-order execution,所以就避開了 Meltdown 攻擊中很重要的一環。

Google 與 Facebook 都在建立消息驗證系統

Google 的在「Fact Check now available in Google Search and News around the world」這,Facebook 的在「Working to Stop Misinformation and False News」這。

Google 是針對搜尋與新聞的部份給出建議,透過第三方的網站確認,像是這樣:

後面的機制是透過公開的協定進行:

For publishers to be included in this feature, they must be using the Schema.org ClaimReview markup on the specific pages where they fact check public statements (documentation here), or they can use the Share the Facts widget developed by the Duke University Reporters Lab and Jigsaw.

但也是透過演算法判斷提供的單位是否夠權威:

Only publishers that are algorithmically determined to be an authoritative source of information will qualify for inclusion.

而 Facebook 是針對 Timeline 上的新聞判斷,但是是透過與 Facebook 合作的 partner 判斷,而且會針對判斷為假的消息降低出現的機率:

We’ve started a program to work with independent third-party fact-checking organizations. We’ll use the reports from our community, along with other signals, to send stories to these organizations. If the fact-checking organizations identify a story as false, it will get flagged as disputed and there will be a link to a corresponding article explaining why. Stories that have been disputed also appear lower in News Feed.

我不是很喜歡 Facebook 的方法,變相的在控制言論自由 (不過也不是第一天了)。

Route53 的 Health Check 支援 HTTPS SNI 了...

Route53 的 Health Check 總算支援 SNI 了:「Amazon Route 53 Adds SNI Support for HTTPS Health Checks」。

With SNI and HTTPS support, you can now create health checks for secure websites that rely on SNI to serve the correct website and certificate to requests for a particular domain name.

這功能早該出現啦,等好久了...

GitHub 支援 Protected Branches 了...

GitHub 宣布支援 Protected Branches 了:「Protected branches and required status checks」,其他的競爭對手都有的功能:

Over the next few weeks we’ll be rolling out a new feature called Protected Branches which gives repository administrators the ability to disable force pushes to specific branches.

另外一起引入的是強制性的檢查:

算是補功能... 組織變大的時候必須透過系統強制介入的功能。