Google 再次改善 Android 的 APK 更新,讓下載的量更小

Google 的人再次更新了演算法,將下載的量再次減少,從本來的 47% 降到 65%:「Saving Data: Reducing the size of App Updates by 65%」。

今年七月的時候,更新演算法導入了 bsdiff,使得本來要抓整包 APK 的量,變成抓 diff 的部份,這使得下載的流量降了 47%:「Improvements for smaller app downloads on Google Play」。

Using bsdiff, we were able to reduce the size of app updates on average by 47% compared to the full APK size.

現在則改成不直接對 APK 做 diff,而是對未壓縮的檔案做,再把差異包起來,則可以降到 65%:

Today, we're excited to share a new approach that goes further — File-by-File patching. App Updates using File-by-File patching are, on average, 65% smaller than the full app, and in some cases more than 90% smaller.

主要的原因在於 APK 的壓縮使用的 DEFLATE 演算法對於變更非常敏感,改變一個字元就會讓後續整串都改變,導致差異很大而跑 diff algorithm 的效果不好:

用 File-by-File 的好處主要來自於是對未壓縮的檔案比較差異,這代表沒有變動的檔案完全不會進來攪和,而對 binary 檔案的效果也比較好 (大部份的程式碼還是一樣)。不過這對於已經有壓縮的圖片的效果就比較差了,這也是 APK 一般肥大常見的原因。

有兩件事情值得注意的,一個是 Google 的人為了使用者體驗,只有在 auto update 時才會走 File-by-File 的更新,主要原因是 File-by-File 的解開速度慢不少:

For now, we are limiting the use of this new patching technology to auto-updates only, i.e. the updates that take place in the background, usually at night when your phone is plugged into power and you're not likely to be using it. This ensures that users won't have to wait any longer than usual for an update to finish when manually updating an app.

另外一個是,這個新方式讓 Google 每天省下 6PB 的流量,如果流量都是平均打散的話,大約是 600Gbps:

The savings, compared to our previous approach, add up to 6 petabytes of user data saved per day!

這種規模改善起來很有感覺 XDDD

Google 推出 BinDiff 分析惡意軟體

看到 Google 推出 BinDiff 時以為是某種對 binary 檔案產生類似 diff 結果的軟體 (像是 bsdiff 這樣的東西),仔細看才發現是跟資安有關的東西:「BinDiff now available for free」。

可以用在只有 binary 的情況下,快速找出有哪些 assembly code 有差異,進而讓人可以更快的分析。資安分析可以透過這個工具加速。相同的,也可以透過這個工具看出 vendor patch 實際上修了什麼東西:

BinDiff is a comparison tool for binary files that helps to quickly find differences and similarities in disassembled code. It is used by security researchers and engineers across the globe to identify and isolate fixes for vulnerabilities in vendor-supplied patches and to analyze multiple versions of the same binary.

而另外一個用途則是快速分類,把相同的 malware 集合起來,降低重複分析的時間:

Another common use case is to transfer analysis results from one binary to another, helping to prevent duplicate analyses of, for example, malware binaries.

目前支援的 assembly 指令集包括了這些:

Compare binary files for x86, MIPS, ARM/AArch64, PowerPC, and other architectures.

從原始文章可以看到還有 flowchart 分析:

不過這是配合其他 Hex-Rays IDA 的 Professional 版本產生的結果分析,官網報價一套是 USD$1129。

To use it, you also need the commercial Hex-Rays IDA Pro disassembler, 6.8 or later.