Home » Posts tagged "binary"

Google Play Store 將支援 Brotli 壓縮

在「Intern Impact: Brotli compression for Play Store app downloads」這邊介紹了 Google Play Store 引入 Brotli 的情況。

選擇 Brotli 除了是 Google 自家研發出來的東西以外,另外是考量到 Brotli 的壓縮與解壓縮速 (尤其是後者) 不會增加太多,卻可以多出不少壓縮率。維基百科這邊說明的是文字的部份:

Replacing deflate with brotli typically gives an increase of 20% in compression density for text files, while compression and decompression speeds are roughly unchanged.

不過實際在 Google Play Store 上測試 binary 的效果也不錯:

當然,如同之前提到的「Google 再次改善 Android 的 APK 更新,讓下載的量更小」,在去年 12 月時 Google 對於背景更新的下載 File-by-File 的更新來降低流量 (但在手機上會需要大量的 CPU 資源計算,不過因為是背景 idle 時跑而不會影響使用者,所以被採用),透過這兩個改善互相搭配繼續壓低流量。

在接下來的幾個禮拜會生效:

Brotli compression for app downloads is rolling out now, and users should start to enjoy the benefits over the coming weeks.

Google 再次改善 Android 的 APK 更新,讓下載的量更小

Google 的人再次更新了演算法,將下載的量再次減少,從本來的 47% 降到 65%:「Saving Data: Reducing the size of App Updates by 65%」。

今年七月的時候,更新演算法導入了 bsdiff,使得本來要抓整包 APK 的量,變成抓 diff 的部份,這使得下載的流量降了 47%:「Improvements for smaller app downloads on Google Play」。

Using bsdiff, we were able to reduce the size of app updates on average by 47% compared to the full APK size.

現在則改成不直接對 APK 做 diff,而是對未壓縮的檔案做,再把差異包起來,則可以降到 65%:

Today, we're excited to share a new approach that goes further — File-by-File patching. App Updates using File-by-File patching are, on average, 65% smaller than the full app, and in some cases more than 90% smaller.

主要的原因在於 APK 的壓縮使用的 DEFLATE 演算法對於變更非常敏感,改變一個字元就會讓後續整串都改變,導致差異很大而跑 diff algorithm 的效果不好:

用 File-by-File 的好處主要來自於是對未壓縮的檔案比較差異,這代表沒有變動的檔案完全不會進來攪和,而對 binary 檔案的效果也比較好 (大部份的程式碼還是一樣)。不過這對於已經有壓縮的圖片的效果就比較差了,這也是 APK 一般肥大常見的原因。

有兩件事情值得注意的,一個是 Google 的人為了使用者體驗,只有在 auto update 時才會走 File-by-File 的更新,主要原因是 File-by-File 的解開速度慢不少:

For now, we are limiting the use of this new patching technology to auto-updates only, i.e. the updates that take place in the background, usually at night when your phone is plugged into power and you're not likely to be using it. This ensures that users won't have to wait any longer than usual for an update to finish when manually updating an app.

另外一個是,這個新方式讓 Google 每天省下 6PB 的流量,如果流量都是平均打散的話,大約是 600Gbps:

The savings, compared to our previous approach, add up to 6 petabytes of user data saved per day!

這種規模改善起來很有感覺 XDDD

Facebook 備份 MySQL 資料並且確認正確性的方法

Facebook 再多花了一些篇幅數對於 MySQL 資料備份以及確認正確性的方法:「Continuous MySQL backup validation: Restoring backups」。

首先是 Continuous Restore Tier (CRT) 這塊,可以看到他們在這塊很仰賴 HDFS 當作備份的第一層基地,包括了 Full logical backups (用 mysqldump)、Differential (diff) backups 以及 Binary log (binlog) backups (stream 進 HDFS)。

另外上了 GTID,對於後續的處理會比較方便:

All of our database servers also use global transaction IDs (GTIDs), which gives us another layer of control when replaying transactions from binlog backups.

在 CRT 這塊可以看到其實是拿現成的工具堆起來,不同單位會因為規模而有不同的作法。真正的重點反而在 ORC Restore Coordinator (ORC) 這塊,可以看到 Facebook 開發了大量的程式將回復這件事情自動化處理:

在收到回復的需求後,可以看到 Peon 會從 HDFS 拉資料出來,並且用 binlog replay 回去:

Peons contain all relevant logic for retrieving backups from HDFS, loading them into their local MySQL instance, and rolling them forward to a certain point in time by replaying binlogs. Each restore job a peon works on goes through these five stages[.]

也是因為 Facebook 對 MySQL 的用量大到需要自動化這些事情,才有這些東西...

Golang 1.7

Golang 1.7 主打更小的 binary size:「Smaller Go 1.7 binaries」:

Typical programs, ranging from tiny toys to large production programs, are about 30% smaller when built with Go 1.7.

還附了一張經典的「Hello, world」程式的分析:

由於現代 CPU 的速度與 L1/L2/... cache 有緊密關係,當 binary size 變小時,常常會伴隨著 memory access 變快 (因為 hitrate 提昇),所以 binary size 也是效能指數蠻重要的一環。

0.1 + 0.2 = 0.30000000000000004

看到「http://0.30000000000000004.com/」這個網站對經典的 0.1 + 0.2 問題整理了各語言的結果。這個網址名稱也很機車啊 XD

開頭的說明講述 IEEE 754 二進制表示法的問題:

Your language isn't broken, it's doing floating point math. Computers can only natively store integers, so they need some way of representing decimal numbers. This representation comes with some degree of inaccuracy. That's why, more often than not, .1 + .2 != .3.

It's actually pretty simple. When you have a base 10 system (like ours), it can only express fractions that use a prime factor of the base. The prime factors of 10 are 2 and 5. So 1/2, 1/4, 1/5, 1/8, and 1/10 can all be expressed cleanly because the denominators all use prime factors of 10. In contrast, 1/3, 1/6, and 1/7 are all repeating decimals because their denominators use a prime factor of 3 or 7. In binary (or base 2), the only prime factor is 2. So you can only express fractions cleanly which only contain 2 as a prime factor. In binary, 1/2, 1/4, 1/8 would all be expressed cleanly as decimals. While, 1/5 or 1/10 would be repeating decimals. So 0.1 and 0.2 (1/10 and 1/5) while clean decimals in a base 10 system, are repeating decimals in the base 2 system the computer is operating in. When you do math on these repeating decimals, you end up with leftovers which carry over when you convert the computer's base 2 (binary) number into a more human readable base 10 number.

這邊主要是討論 IEEE 754-1985 這個標準,後來在 IEEE 754-2008 提出了新的表示方法,支援十進位的表示法來解這個問題 (雖然還沒普及)。

Command Line 下把 Hex 轉成 Base64...

每次都忘記,寫一篇之後查比較方便... 重點在對 xxd 的變化應用,而 xxd 被包在 Vim 裡,所以應該都會裝... 吧...

xxd 預設是把 binary 轉成 hex,但你可以用 -r 參數變成反向,也就是 hex 轉 binary。

所以剩下的就很簡單了,先把 hex 轉成 binary 再轉成 base64:

echo 0123456789ABCDEF | xxd -r -p | base64

這邊有裝 Base64 所以可以直接用,如果沒有的話,可以用 OpenSSL 替代:

echo 0123456789ABCDEF | xxd -r -p | openssl enc -base64

Archives