Home » Posts tagged "timing"

讀書時間:Spectre 的攻擊方式

上次寫了 Meltdown 攻擊的讀書心得 (參考「讀書時間:Meltdown 的攻擊方式」),結果後來中獎狂流鼻水,加上 Spectre 用的手法就更複雜,慢慢看的情況就拖到最近才看完... 這邊就以讀者看過 Meltdown 那篇心得的前提來描述 Spectre。

Spectre 的精華在於 CPU 支援 branch prediction 與 out-of-order execution,也就是 CPU 遇到 branch 時會學習怎麼跑,這個資訊提供給 out-of-order execution 就可以大幅提昇執行速度。可以參考以前在「CPU Branch Prediction 的成本...」提到的效率問題。

原理的部份可以看這段程式碼:

這類型程式碼常常出現在現代程式的各種安全檢查上:確認 x 沒問題後再實際將資料拉出來處理。而我們可以透過不斷的丟 x 值進去,讓 CPU 學到以為都是 TRUE,而在 CPU 學壞之後,突然丟進超出範圍的 x,產生 branch misprediction,但卻已經因為 out-of-order execution 而讓 CPU 執行過 y = ... 這段指令,進而導致 cache 的內容改變。

然後其中讓人最驚豔的攻擊,就是論文示範了透過瀏覽器的 JavaScript 就能打的讓人不要不要的...

圖片裡,上面這段是 JavaScript 程式碼,下面則是 Chrome V8JIT 後轉成的 assembly (這是 AT&T style):

可以從這段程式碼看到,他想要透過這段 JavaScript 取出本來無法存取到的祕密值 index,然後透過 probeTable 得知 cache 的變化。

在這樣的攻擊下,你就可以取得這個 process 裡可以看到的空間,甚至極端的 case 下有可能是 kernel space (配合 Meltdown 的條件)。

不過如果你不能跑 JavaScript 也沒關係,Spectre 的論文裡也提供各種變形方式提供攻擊。像是這樣的程式碼也可以被拿來攻擊:

if (false but mispredicts as true)
    read array1[R1]
read [R2]

其中 R1 是有帶有祕密值的 register,當 array[R1] 有 cache 時,讀 [R2] 就有機會比較快,而沒有 cache 時就會比較慢 (這是因為 memory bus 被佔用的關係),在這個情境下就能夠產生 timing attack:

Suppose register R1 contains a secret value. If the speculatively executed memory read of array1[R1] is a cache hit, then nothing will go on the memory bus and the read from [R2] will initiate quickly. If the read of array1[R1] is a cache miss, then the second read may take longer, resulting in different timing for the victim thread.

所以相同道理,利用乘法器被佔用的 timing attack 也可以產生攻擊:

if (false but mispredicts as true)
    multiply R1, R2
multiply R3, R4

在論文裡面提到相當多的方法 (甚至連 branch target buffers (BTB) 都可以拿來用),就麻煩去論文裡看了。現在用 cache 算是很有效的方式,所以攻擊手法主要都是透過 cache 在取得資訊。

Spectre 論文提到的 mitigation (workaround) 是透過 mfencelfence 強制程式碼的順序,但這表示 compiler 要針對所有的 branch 加上這段,對效能影響應該蠻明顯的:

In addition, of the three user-mode serializing instructions listed by Intel, only cpuid can be used in normal code, and it destroys many registers. The mfence and lfence (but not sfence) instructions also appear to work, with the added benefit that they do not destroy register contents. Their behavior with respect to speculative execution is not defined, however, so they may not work in all CPUs or system configurations.

Google 推出的 Retpoline 則是想要避免這個問題。Google 在「Retpoline: a software construct for preventing branch-target-injection」這邊詳細說明了 Retpoline 的原理與方法,採取的方向是控制 speculative execution:

However, we may manipulate its generation to control speculative execution while modifying the visible, on-stack value to direct how the branch is actually retired.

這個方式是抽換掉 jmpcall 兩個指令,以 *%r11 為例,他將 jmp *%r11call *%r11 改成 jmp retpoline_r11_trampolinecall retpoline_r11_trampoline (這邊的 jmp 指的是所有 jump 系列的指令,像是 jz 之類的):

retpoline_r11_trampoline:
  call set_up_target;
capture_spec:        
  pause;
  jmp capture_spec;
set_up_target:
  mov %r11, (%rsp); 
  ret;

藉由抽換 %rsp 內容跳回正確位置,然後也利用這樣的程式結構控制 CPU 的 speculative execution。

而在效能損失上,已經有測試報告出來了。其實並沒有像 Google 說的那麼無痛,還是會因為應用差異而有不同等級的效能損失... 可以看到有些應用其實還是很痛:「Benchmarking Linux With The Retpoline Patches For Spectre」。

下半年新出的 CPU 應該會考慮這些問題了吧,不過不知道怎麼提供解法 @_@

OpenSSL 的 DSA 被 Side-channel attack 打爆

在「Make Sure DSA Signing Exponentiations Really are Constant-Time」這篇文章裡面,直接透過 end-to-end 的 timing attack 打爆 (也就是透過 internet 觀察攻擊),而不需要在同一台機器上對 cache 之類的區域攻擊:

A unique feature of our work is that we target common cryptographic protocols. Previous works that demonstrate cache-timing key-recovery attack only target the cryptographic primitives, ignoring potential cache noise from the protocol implementation. In contrast, we present end-to-end attacks on two common cryptographic protocols: SSH and TLS. We are, therefore, the first to demonstrate that cache-timing attacks are a threat not only when executing the cryptographic primitives but also in the presence of the cache activity of the whole protocol suite.

而且次數相當的少,就可以 key recovery:

260 SSH-2 handshakes to extract a 1024/160-bit DSA host key from an OpenSSH server, and 580 TLS 1.2 handshakes to extract a 2048/256-bit DSA key from an stunnel server.

CVE 編號為 CVE-2016-2178OpenSSL 全系列 (包括 fork 出去的版本) 與 OpenSSH 只要是 DSA 的實作都中獎...

Amazon 之前放出的 s2n 的安全性問題

Amazon 之前放 s2n 出來當作 TLS protocol 的方案,於是就有人摸出東西來:「Lucky Microseconds: A Timing Attack on Amazon's s2n Implementation of TLS」。

即使是經過外部資安檢證,仍然還是有找到問題。這次找到的問題是 timing attack 類在 CBC-mode 下的 plaintext recovery:

At the time of its release, Amazon announced that s2n had undergone three external security evaluations and penetration tests. We show that, despite this, s2n - as initially released - was vulnerable to a timing attack in the case of CBC-mode ciphersuites, which could be extended to complete plaintext recovery in some settings.

攻擊分成兩個階段:

Our attack has two components. The first part is a novel variant of the Lucky 13 attack that works even though protections against Lucky 13 were implemented in s2n. The second part deals with the randomised delays that were put in place in s2n as an additional countermeasure to Lucky 13. Our work highlights the challenges of protecting implementations against sophisticated timing attacks.

最後還是酸了一下 Amazon:

It also illustrates that standard code audits are insufficient to uncover all cryptographic attack vectors.

Amazon 的官方說明則在「s2n and Lucky 13」這邊可以看到。

SSL/TLS 的問題...

這篇與「對稱式加密系統的爆炸歷史 (Authenticated encryption 的問題)」這篇相關,建議可以一起看一看。

TLS (Transport Layer Security),前身是 SSL (Secure Sockets Layer),是目前 HTTPS 所使用的加密協議。發展的順序上是 SSLv2、SSLv3、TLSv1、TLSv1.1、TLSv1.2。

然後有兩篇文章可以看:

第一篇文章講 Padding oracle attack,第二篇文章是酸 SSL/TLS 的修正愈修愈歪... XD

AES 這類的 block cipher 在加密或解密時會要求切齊 block size,以 AES 的要求就是 128bits (16 bytes)。

而對於不齊的資料要怎麼加密呢?其中一個方法是 PKCS#7:(圖片取自第二篇文章)

Padding

要想辦法補齊 128bits (16bytes),如果像上圖需要補 7bytes 進去,就都補上 \x07 (剛好就是補上長度),另外在最後面會補上 padding 的長度,而問題出就出在這個設計先天就有缺陷:在 SSL/TLS 所使用的 MAC-then-Encrypt 中,MAC 只計算原文的值,沒有保護到 padding 的部份,於是就可以針對 padding 的部份想辦法找到洞鑽。

pseudo code 可能是這樣:

// Decrypt to plaintext + mac + padding
$plaintext_mac_padding = decrypt($ciphertext);
if (NULL != $plaintext_mac_padding) {

    // Now decode padding part
    $plaintext_mac = decode_padding($plaintext_mac_padding, $padding_length);
    if (NULL != $plaintext_mac) {

        // Now check MAC part
        $plaintext = check_mac(plaintext_mac);
        if (NULL != $plaintext) {

            // Now it's okay
        }
    }
}

攻擊者亂改 $ciphertext 會導致解出來的 padding 也亂掉,但早期的 SSL 會回傳「padding error」這種對攻擊者有利的資訊,而導致攻擊者可以利用這個資訊想辦法得知更多內容。

而 TLS 並沒有從根本改善,而是試著加上機制補西牆:當遇到錯誤時就跳過,不要傳回錯誤資訊。

但因為攻擊者亂改封包造成 decode_padding() 會失敗,而沒有呼叫到 check_mac()。這導致了大量的計算時間差與能量差,而使得攻擊者可以藉由這些資訊而得知是否成功。而官方在 TLSv1.2 的建議是再補上機制來補洞:

In general, the best way to do this is to compute the MAC even if the padding is incorrect, and only then reject the packet. For instance, if the pad appears to be incorrect, the implementation might assume a zero-length pad and then compute the MAC.

而官方認為雖然這樣還是有 timing channel,但已經小到會被雜訊覆蓋,所以「應該」可以解決問題:

This leaves a small timing channel, since MAC performance depends to some extent on the size of the data fragment, but it is not believed to be large enough to be exploitable, due to the large block size of existing MACs and the small size of the timing signal.

於是,只要覺得「應該安全吧」,就會「應該會被破」:「Lucky Thirteen: Breaking the TLS and DTLS Record Protocols」:

The attacks apply to all TLS and DTLS implementations that are compliant with TLS 1.1 or 1.2, or with DTLS 1.0 or 1.2. They also apply to implementations of SSL 3.0 and TLS 1.0 that incorporate countermeasures to previous padding oracle attacks. Variant attacks may also apply to non-compliant implementations.

這 SSL/TLS 的設計讓人補到快起笑了... XD

資安的東西通常是愈複雜就愈容易被抓問題出來,在 SSL/TLS 的歷史包袱下,不知道什麼時候才想換 Encrypt-then-MAC 來改善底層問題...

Archives