Home » Posts tagged "dictionary"

Cloudflare 推出在 HTTPS 下的壓縮機制

在 TLS (HTTPS) 環境下基本上都不能開壓縮,主要是為了避免 secret token 會因為 dictionary 的可預測性而被取出,像是 CRIMEBREACHTIMEHEIST (沒完結過...),而因為全面關閉壓縮,對於效能的影響很大。

Cloudflare 就試著去找方法,是否可以維持壓縮,但又不會洩漏 secret token 的方式,於是就有了這篇:「A Solution to Compression Oracles on the Web」。

重點在於 Our Solution 這段的開頭:

We decided to use selective compression, compressing only non-secret parts of a page, in order to stop the extraction of secret information from a page.

透過 regex 判斷那些東西屬於 secret token,然後對這些資料例外處理不要壓縮,而其他的部份就可以維持壓縮。這樣傳輸量仍然可以大幅下降,但不透漏 secret token。然後因為這個想法其實很特別,沒有被實證過,所以成立了 Challenge Site 讓大家打:

We have set up the challenge website compression.website with protection, and a clone of the site compression.website/unsafe without it. The page is a simple form with a per-client CSRF designed to emulate common CSRF protection. Using the example attack presented with the library we have shown that we are able to extract the CSRF from the size of request responses in the unprotected variant but we have not been able to extract it on the protected site. We welcome attempts to extract the CSRF without access to the unencrypted response.

這個方向如果可行的話,應該會有人發展一些標準讓 compression algorithm 不用猜哪些是 secret token,這樣一來就更能確保因為漏判而造成的 leaking...

Pinterest 對 InnoDB 壓縮的改善

三個月前 Pinterest 提到對 InnoDB 壓縮的改善,講到透過字典的改善方式:「Pinterest 在 InnoDB Compression 的努力」。

而在「Evolving MySQL Compression - Part 2」這邊繼續說明要怎麼生出對 Pinterest 比較有效的字典內容,作者把計算的工具放到 GitHub 上讓其他人可以用 (用 Python 寫的):「pinterest/mysql_utils/zdict_gen/」。

可以看出來又增加不少壓縮率,這算是針對資料庫壓縮從 A 到 A+ 的行為吧...


先前在「NIST 新的密碼規範」這邊提到了用字典檔避免使用者選擇弱密碼的問題:

When processing requests to establish and change memorized secrets, verifiers SHOULD compare the prospective secrets against a dictionary of known commonly-used and/or compromised values. This list SHOULD include passwords from previous breach corpuses, as well as dictionary words and specific words (such as the name of the service itself) that users are likely to choose. If the chosen secret is found in the dictionary, the subscriber SHOULD be required to choose a different value. The subscriber SHOULD be advised that they need to select a different secret because their previous choice was commonly used.

除了一般的字典檔以外,還要從之前被破的網站取得。這部份的資料可以從 danielmiessler/SecLists 這邊的 Passwords 目錄下取得,資料不算太多,但應該夠用。

教育部三本字辭典改用 CC BY-ND 3.0 TW 授權

剛剛看到的消息,教育部國語辭典公眾授權網採用 CC BY-ND 3.0 TW 授權,將《重編國語辭典修訂本》、《國語辭典簡編本》、《國語小字典》三本字辭典公開授權,並且提供結構化的資料下載:

ND 有點可惜啊,不過是一大步了...