sysbench 的 RNG

Percona 的 blog 上看到了 sysbench 的 RNG (Random Number Generator) 跟想像中的不太一樣:「What You May Not Know About Random Number Generation in Sysbench」。

預設是 Special:

而不是直覺的 Uniform (也有提供):

另外還提供三種:

翻了翻可以翻到其他四個的理論基礎,但就是不知道 Special 的設計理論在哪裡...

利用 Sensor 校正資訊產生 Device Fingerprint 的隱私攻擊

看到「Fingerprinting iPhones」這篇提出的攻擊,標題雖然是提到 iPhone,但實際上攻擊包括了 Android 的手機:

You are affected by this fingerprinting attack if you are using any iOS devices with the iOS version below 12.2, including the latest iPhone XS, iPhone XS Max, and iPhone XR. You are also likely to be affected if you are using a Pixel 2/3 device, although we hypothesise the generated fingerprint has less entropy and is unlikely to be globally unique. A SensorID can be generated by both apps and mobile websites and requires no user interaction.

目前 iPhone 升級到 12.2 之後可以緩解這個問題,Android 看起來還不清楚...

攻擊的方式是透過手機在出場前會使用外部的校正工具,找出手機內 sensor 所偵測到的值與實際值的差異,然後把這些資訊燒到韌體裡,當呼叫 API 時就可以修正給出比較正確的值。

而因為這些校正資訊幾乎每一隻手機都不一樣,而且不會因為重裝而變更 (即使 factory reset),加上還可以跨 app 與 web 追蹤,就成為這次攻擊的目標:

In the context of mobile devices, the main benefit of per-device calibration is that it allows more accurate attitude estimation.

資訊量其實相當大,透過 app 分析可以得到 67 bits entropy,透過網頁也有 42 bits entropy,而且不怎麼會變:

In general, it is difficult to create a unique fingerprint for iOS devices due to strict sandboxing and device homogeneity. However, we demonstrated that our approach can produce globally unique fingerprints for iOS devices from an installed app -- around 67 bits of entropy for the iPhone 6S. Calibration fingerprints generated by a website are less unique (~42 bits of entropy for the iPhone 6S), but they are orthogonal to existing fingerprinting techniques and together they are likely to form a globally unique fingerprint for iOS devices.

We have not observed any change in the SensorID of our test devices in the past half year. Our dataset includes devices running iOS 9/10/11/12. We have tested compass calibration, factory reset, and updating iOS (up until iOS 12.1); the SensorID always stays the same. We have also tried measuring the sensor data at different locations and under different temperatures; we confirm that these factors do not change the SensorID either.

目前提出來的解法是加入隨機值的噪音 (iOS 的作法),不過作者有建議預設應該要關閉 js 存取 sensor 的權限:

To mitigate this calibration fingerprint attack, vendors can add uniformly distributed random noise to ADC outputs before calibration is applied. Alternatively, vendors could round the sensor outputs to the nearest multiple of the nominal gain. Please refer to our paper for more details. In addition, we recommend privacy-focused mobile browsers add an option to disable the access to motion sensors via JavaScript. This could help protect Android devices and iOS devices that no longer receive updates from Apple.

不過當初這群人怎麼會注意到的...

AWS 提供模擬 Amazon Aurora 異常的測試功能...

Twitter 上看到 Jeff Barr 提到了在 Amazon Aurora 上的模擬 (這邊應該是講 MySQL):

指到的頁面是文件「Managing Amazon Aurora MySQL - Amazon Relational Database Service」,翻了一下 Wayback Machine,看起來之前就有了,只是現在拿出來再宣傳一下:「Managing Amazon Aurora MySQL - Amazon Relational Database Service」。

透過主動觸發 Amazon Aurora 異常,可以測試整個系統的後續反應:

  • A crash of the master instance or an Aurora Replica
  • A failure of an Aurora Replica
  • A disk failure
  • Disk congestion

前面三種都屬於 Aurora 本身的故障測試,第四種除了有可能是 Aurora 本身的問題外,也可以測壓力過大時的情境 (i.e. 前面透過 auto scaling 撐住了,但後面的資料庫可能沒有足夠的能力支撐)。

Googlebot 的 Math.random()

Hacker News Daily 上看到「Googlebot’s Javascript random() function is deterministic」這則有趣的發現。作者發現 Googlebot 的 Math.random() 並不隨機,甚至是固定的:

The first time Googlebot calls Math.random() the result will always be 0.14881141134537756, the second call will always be 0.19426893815398216. The script I linked to above simply uses this fact but obfuscates it a little and ‘seed’ it with something that doesn’t look too arbitrary.

需要無法預測的 random number (有安全性需求的) 應該用 RandomSource.getRandomValues() 這類函數,而不是用 Math.random(),所以這點倒是還好...

Ethereum Smart Contracts 裡的 PRNG

現代密碼學的安全性有很大一塊是基於亂數產生器 (RNG) 非常難被預測。如果這個前提不成立的話,利用亂數產生器產生出來的各種資訊都會被預測出來 (尤其是 Private Key)。

但真正的 RNG 需要靠硬體支援,而且產生速度很慢,一般都會使用 PRNG (Pseudorandom number generator) 產生。也就是「看起來」很亂的亂數產生器。

PRNG 通常是指在統計學上通過許多測試,像是在多種測試都是 Discrete uniform distribution,不需要防止有惡意人,可以從產生出的 PRNG 的值而推導出後續結果的用途。

在「Predicting Random Numbers in Ethereum Smart Contracts」這篇裡面,作者列出了一堆實做 Ethereum Smart Contracts 卻誤用 PRNG 的行為。

文章裡提到的問題都是因為 PRNG 拿著可被預測的資訊當作 entropy source (e.g. seed),而且提出來的範例都是拿 block 本身或衍生的資訊 (像是 block 的 hash) 來用:

  • PRNGs using block variables as a source of entropy
  • PRNGs based on a blockhash of some past block
  • PRNGs based on a blockhash of a past block combined with a seed deemed private
  • PRNGs prone to front-running

然後列了大量的程式碼當例子,建議有需要接觸的人看過一次,或是有時間的人都值得看這些負面範例... XDDD

不過作者在文章裡面也給了一堆有問題的方法,像是從外部網站取得亂數之類的 XDDD

正確的方法是使用 CSPRNG (Cryptographically secure pseudorandom number generator),這是專門設計給密碼學用的 PRNG。

CSPRNG 有幾種方法可以取得:

  • 在大多數的程式語言內都有對應的 library 可以用,另外在比較近代的瀏覽器內 (如果問 IE 的話,是 11+),可以透過 RandomSource.getRandomValues() 得到。
  • 如果打算自己搞底層而需要直接取得 CSPRNG 的產出,在 Unix-like 的環境下可以透過 /dev/urandom 取得,在 Microsoft Windows 下則可以透過 CryptGenRandom 取得。

別學作者那邊奇怪方法啊 XDDD

在 Ubuntu 16.04 上面隨機改變無線網卡的 MAC address

在「Randomize your WiFi MAC address on Ubuntu 16.04」這邊看到作者在介紹如何在 Ubuntu 上藉由改變無線網卡的 MAC address 保護自己的隱私:

Your device’s MAC address can be used to track you across the WiFi networks you connect to. That data can be shared and sold, and often identifies you as an individual. It’s possible to limit this tracking by using pseudo-random MAC addresses.

成果像是這樣:

主要應該是給 Ubuntu 的筆電使用者用...

Etsy 介紹的 Cache Smearing

Etsy 的 engineering blog 上提到了他們怎麼設計 cache 機制:「How Etsy caches: hashing, Ketama, and cache smearing」。

使用 consistent hash 已經是基本款了,文章裡花了一些篇幅介紹為什麼要用 consistent hash。

後半段則是有了 consistent hash 後會遇到的問題,也就是講 hot key 怎麼處理:有些資料非常熱 (常常被存取),就算用 consistent hash 也還是有可能搞爆單一機器。

他們做了幾件事情,第一件事情是設計 cache smearing 機制,把單一資料加上 random key,使得不同的 key 會打散到不同的機器上:

Let’s take an example of a hot key popular_user_data. This key is read often (since the user is popular) and is hashed to pool member 3. Cache smearing appends a random number in a small range (say, [0, 8)) to the key before each read or write. For instance, successive reads might look up popular_user_data3, popular_user_data1, and popular_user_data6. Because the keys are different, they will be hashed to different hosts. One or more may still be on pool member 3, but not all of them will be, sharing the load among the pool.

第二件事情則是監控哪些 key 比較熱門:

We’ve seen this problem many times over the years, using mctop, memkeys, and our distributed tracing tools to track down hot keys.

第三件事情是維護 hot key 的清單 (不是每個 key 都會上 cache smearing):

We manually add cache smearing to only our hottest keys, leaving keys that are less read-heavy efficiently stored on a single host.

是個當規模大到單一 hot key 會讓單台伺服器撐不住時的 workaround...

The DUHK Attack:因為亂數產生器的問題而造成的安全漏洞

Bruce Schneier 那邊看到的:「Attack on Old ANSI Random Number Generator」,攻擊的網站在「The DUHK Attack」,論文在「Practical state recovery attacks against legacy RNG implementations (PDF)」。

攻擊的對象是 ANSI X9.31 Random Number Generator:

DUHK (Don't Use Hard-coded Keys) is a vulnerability that affects devices using the ANSI X9.31 Random Number Generator (RNG) in conjunction with a hard-coded seed key.

然後攻擊的對象是 FortinetFortiOS

Traffic from any VPN using FortiOS 4.3.0 to FortiOS 4.3.18 can be decrypted by a passive network adversary who can observe the encrypted handshake traffic.

如果照說明的只到 4.3.18,那麼去年 11 月更新的 4.3.19 (參考「FortiOS 4.3.19 Release Notes」) 應該是修正了?不過裡面沒翻到類似的資料,是剛好把 RNG 換掉了嗎?

Amazon 的多變數最佳化

在「An efficient bandit algorithm for real-time multivariate optimization」這邊提到了 Amazon 不是走傳統的 A/B testing,而是同時進行多變數的最佳化:

Consider the problem of trying to find a near-optimal version of a promotional message such as this one, which has 5 variable parts and 48 different combinations in total.

在這樣的測試數量下,作者預估需要 66 天才能夠得到有效的結果,而這也表示當變數更多的時候問題就更大了:

Based on the Amazon success rate and traffic size, the authors calculated it would take 66 days to conduct a 48 treatment randomized experiment. Often this isn’t practical.

也就是開頭提到的,如何一個禮拜就提昇 21% conversion rate:

Aka, “How Amazon improved conversion by 21% in a single week!”

作者也提到了這個方法其實打臉了他先前提到的另外一篇論文,在 2014 年提出測試應該要盡可能簡單 XDDD:

Yesterday we saw the hard-won wisdom on display in ‘seven myths‘ recommending that experiments be kept simple and only test one thing at a time, otherwise interpreting the results can get really complicated.

只能說狀況愈來愈複雜,導致需要新方法解決問題。而且這些電商會遇到在測試時不同的 factor 之間有可能會有相依性 (也就是說這些 factor 不是 i.i.d.),你用本來的方式反而會測不出來。

Swap 對 InnoDB 的影響

Percona 的老大拿 5.7 版做實驗,確認 swap 對 InnoDB 的影響:「The Impact of Swapping on MySQL Performance」。

測試的機器是 32GB RAM,作業系統 (以及 swap) 裝在已經有點年紀的 Intel 520 SSD 上,而 MySQL 則是裝在 Intel 750 NVMe 上。透過對 innodb_buffer_pool 的調整來看情況。

可以看到設為 24GB (記憶體 75% 的量) 時很穩定的在 44K QPS 與 3.5ms (95%):

This gives us about 44K QPS. The 95% query response time (reported by sysbench) is about 3.5ms.

而當設成 32GB 的時候開始可以觀察到 swap i/o,掉到 20K QPS 與 9ms (95%):

We can see that performance stabilizes after a bit at around 20K QPS, with some 380MB/sec disk IO and 125MB/sec swap IO. The 95% query response time has grown to around 9ms.

當拉到 48GB 的時候就更掉更多,6K QPS 與 35ms (95%):

Now we have around 6K QPS. Disk IO has dropped to 250MB/sec, and swap IO is up to 190MB/sec. The 95% query response time is around 35ms.

作者發現掉的比率沒有想像中大:

When I started, I expected severe performance drop even with very minor swapping. I surprised myself by getting swap activity to more than 100MB/sec, with performance “only” halved.

這邊測試用的是 SSD,如果是傳統用磁頭的硬碟,對 random access 應該會很敏感而掉更多:

This assumes your swap space is on an SSD, of course! SSDs handle random IO (which is what paging activity usually is) much better than HDDs.

基本上還是要避免碰到 swap 啦,另外 comment 的地方剛好有提到前陣子在猜測的 best practice,測試時的 vm.swappiness 是設成 1,這應該是作者的 best practice:

Swappiness was set to 1 in this case. I was not expecting this to cause significant impact as swapping is caused by genuine (intended) missconfiguration with more memory required than available.