獨立遊戲創作者推出 Linux 版的好處

標題不知道怎麼下,大概就是這樣...

Hacker News 首頁上翻到的,以這個 upvote 數量來看,應該會收到今天的 Hacker News Daily 上:「Despite having just 5.8% sales, over 38% of bug reports come from Linux (reddit.com)」。

作者是一位獨立遊戲開發者,在兩年前推出了「ΔV: Rings of Saturn」這款遊戲,並且也發佈了 Linux 版。

作者先就數字來比較,他賣出了 12000 套,其中 700 套是 Linux 玩家;另外他收到了 1040 個 bug report,其中大約 400 個是從 Linux 玩家回報的:

As of today, I sold a little over 12,000 units of ΔV in total. 700 of these units were bought by Linux players. That’s 5.8%. I got 1040 bug reports in total, out of which roughly 400 are made by Linux players.

That’s one report per 11.5 users on average, and one report per 1.75 Linux players. That’s right, an average Linux player will get you 650% more bug reports.

看文章時可能會覺得「Linux 玩家真難伺候」,但實際上作者講到,這 400 個 bug 裡面只有 3 個 bug 是平台相關的 (只會發生在 Linux 上),其他的 bug 其實是所有平台都會發生:

A lot of extra work for just 5.8% of extra units, right?

Wrong. Bugs exist whenever you know about them, or not.

Do you know how many of these 400 bug reports were actually platform-specific? 3. Literally only 3 things were problems that came out just on Linux. The rest of them were affecting everyone[.]

原因是 Linux 社群在參與各種 open source project 養成的習慣,會盡可能把各種資訊講清楚,並且找出可以重製問題的方式:

The thing is, the Linux community is exceptionally well trained in reporting bugs. That is just the open-source way. This 5.8% of players found 38% of all the bugs that affected everyone. Just like having your own 700-person strong QA team. That was not 38% extra work for me, that was just free QA!

But that’s not all. The report quality is stellar.

與一般玩家的回報方式完全不同,Linux 玩家很習慣就會附上基本的環境資訊,系統紀錄,甚至有時候還會包括 core dump 與 reproducible steps:

I mean we have all seen bug reports like: “it crashes for me after a few hours”. Do you know what a developer can do with such a report? Feel sorry at best. You can’t really fix any bug unless you can replicate it, see it with your own eyes, peek inside and finally see that it’s fixed.

And with bug reports from Linux players is just something else. You get all the software/os versions, all the logs, you get core dumps and you get replication steps. Sometimes I got with the player over discord and we quickly iterated a few versions with progressive fixes to isolate the problem. You just don’t get that kind of engagement from anyone else.

不知道有沒有遇到回報 GDB 資訊的...

很特別的分享 XDDD

讀書時間:Meltdown 的攻擊方式

Meltdown 的論文可以在「Meltdown (PDF)」這邊看到。這個漏洞在 Intel 的 CPU 上影響最大,而在 AMD 是不受影響的。其他平台有零星的消息,不過不像 Intel 是這十五年來所有的 CPU 都中獎... (從 Pentium 4 以及之後的所有 CPU)

Meltdown 是基於這些前提,而達到記憶體任意位置的 memory dump:

  • 支援 µOP 方式的 out-of-order execution 以及當失敗時的 rollback 機制。
  • 因為 cache 機制造成的 side channel information leak。
  • 在 out-of-order execution 時對記憶體存取的 permission check 失效。

out-of-order execution 在大學時的計算機組織應該都會提到,不過我印象中當時只講「在確認不相干的指令才會有 out-of-order」。而現代 CPU 做的更深入,包括了兩個部份:

  • 第一個是 µOP 方式,將每個 assembly 拆成更細的 micro-operation,後面的 out-of-order execution 是對 µOP 做。
  • 第二個是可以先執行下去,如果發現搞錯了再 rollback。

像是下面的 access() 理論上不應該被執行到,但現代的 out-of-order execution 會讓 CPU 有機會先跑後面的指令,最後發現不該被執行到後,再將 register 與 memory 的資料 rollback 回來:

而 Meltdown 把後面不應該執行到 code 放上這段程式碼 (這是 Intel syntax assembly):

其中 mov al, byte [rcx] 應該要做記憶體檢查,確認使用者是否有權限存取那個位置。但這邊因為連記憶體檢查也拆成 µOP 平行跑,而產生 race condition:

Meltdown is some form of race condition between the fetch of a memory address and the corresponding permission check for this address.

而這導致後面這段不該被執行到的程式碼會先讀到資料放進 al register 裡。然後再去存取某個記憶體位置造成某塊記憶體位置被讀到 cache 裡。

造成 cache 內的資料改變後,就可以透過 FLUSH+RELOAD 技巧 (side channel) 而得知這段程式碼讀了哪一塊資料 (參考之前寫的「Meltdown 與 Spectre 都有用到的 FLUSH+RELOAD」),於是就能夠推出 al 的值...

而 Meltdown 在 mov al, byte [rcx] 這邊之所以可以成立,另外一個需要突破的地方是 [rcx]。這邊 [rcx] 存取時就算沒有權限檢查,在 virtual address 轉成 physical address 時應該會遇到問題?

原因是 LinuxOS X 上有 direct-physical map 的機制,會把整塊 physical memory 對應到 virtual memory 的固定位置上,這些位置不會再發給 user space 使用,所以是通的:

On Linux and OS X, this is done via a direct-physical map, i.e., the entire physical memory is directly mapped to a pre-defined virtual address (cf. Figure 2).

而在 Windows 上則是比較複雜,但大部分的 physical memory 都有對應到 kernel address space,而每個 process 裡面也都還是有完整的 kernel address space (只是受到權限控制),所以 Meltdown 的攻擊仍然有效:

Instead of a direct-physical map, Windows maintains a multiple so-called paged pools, non-paged pools, and the system cache. These pools are virtual memory regions in the kernel address space mapping physical pages to virtual addresses which are either required to remain in the memory (non-paged pool) or can be removed from the memory because a copy is already stored on the disk (paged pool). The system cache further contains mappings of all file-backed pages. Combined, these memory pools will typically map a large fraction of the physical memory into the kernel address space of every process.

這也是 workaround patch「Kernel page-table isolation」的原理 (看名字大概就知道方向了),藉由將 kernel 與 user 的區塊拆開來打掉 Meltdown 的攻擊途徑。

而 AMD 的硬體則是因為 mov al, byte [rcx] 這邊權限的檢查並沒有放進 out-of-order execution,所以就避開了 Meltdown 攻擊中很重要的一環。

MySQL 5.6 到 5.7 改變的預設值

Percona 整理了一份 MySQL 5.6 到 5.7 改變的預設值,對於評估與轉移的人都很有用:「MySQL Default Configuration Changes between 5.6 and 5.7」。

sync_binlog 居然從 0 改成 1 了,這對效能的影響應該不少。

performance_schema_* 有不少改成自動調整了,可以省下不少功夫。

innodb_buffer_pool_dump_at_shutdowninnodb_buffer_pool_load_at_startup 都打開了,這避免了正常重啟時的 warm up 問題,不過在存在有效的手段可以手動 warm up 的時,應該還是會關掉吧。(參考 2013 的文章「熱 MySQL InnoDB 的方式...」)

另外介紹了 InnoDB 預設格式的改變,這點到是因為使用 COMPRESSED,反而不太受到影響。

從爆炸的遺跡尋找蛛絲馬跡:在 MySQL crash 後,從 core file 找出當時有哪些 query 在執行

Percona 寫了一篇文章,說明在 MySQL core dump 後,要如何從 core file 撈出當時有哪些 query 在跑:「How to Extract All Running Queries (Including the Last Executed Statement) from a Core File?」。

一年前 Percona 的人也有寫一篇「How to obtain the “LES” (Last Executed Statement) from an Optimized Core Dump?」,今年這篇則是改良版,可以看到操作的方式便簡單了...

雖然大家都不喜歡 MySQL crash,但知道爆炸當時有哪些 query 對於後續的分析有很大的幫助,先紀錄起來,如果真的不幸用到,會比較快找到資料...