Home » Posts tagged "random"

AWS 提供模擬 Amazon Aurora 異常的測試功能...

Twitter 上看到 Jeff Barr 提到了在 Amazon Aurora 上的模擬 (這邊應該是講 MySQL):

指到的頁面是文件「Managing Amazon Aurora MySQL - Amazon Relational Database Service」,翻了一下 Wayback Machine,看起來之前就有了,只是現在拿出來再宣傳一下:「Managing Amazon Aurora MySQL - Amazon Relational Database Service」。

透過主動觸發 Amazon Aurora 異常,可以測試整個系統的後續反應:

  • A crash of the master instance or an Aurora Replica
  • A failure of an Aurora Replica
  • A disk failure
  • Disk congestion

前面三種都屬於 Aurora 本身的故障測試,第四種除了有可能是 Aurora 本身的問題外,也可以測壓力過大時的情境 (i.e. 前面透過 auto scaling 撐住了,但後面的資料庫可能沒有足夠的能力支撐)。

Googlebot 的 Math.random()

Hacker News Daily 上看到「Googlebot’s Javascript random() function is deterministic」這則有趣的發現。作者發現 Googlebot 的 Math.random() 並不隨機,甚至是固定的:

The first time Googlebot calls Math.random() the result will always be 0.14881141134537756, the second call will always be 0.19426893815398216. The script I linked to above simply uses this fact but obfuscates it a little and ‘seed’ it with something that doesn’t look too arbitrary.

需要無法預測的 random number (有安全性需求的) 應該用 RandomSource.getRandomValues() 這類函數,而不是用 Math.random(),所以這點倒是還好...

Ethereum Smart Contracts 裡的 PRNG

現代密碼學的安全性有很大一塊是基於亂數產生器 (RNG) 非常難被預測。如果這個前提不成立的話,利用亂數產生器產生出來的各種資訊都會被預測出來 (尤其是 Private Key)。

但真正的 RNG 需要靠硬體支援,而且產生速度很慢,一般都會使用 PRNG (Pseudorandom number generator) 產生。也就是「看起來」很亂的亂數產生器。

PRNG 通常是指在統計學上通過許多測試,像是在多種測試都是 Discrete uniform distribution,不需要防止有惡意人,可以從產生出的 PRNG 的值而推導出後續結果的用途。

在「Predicting Random Numbers in Ethereum Smart Contracts」這篇裡面,作者列出了一堆實做 Ethereum Smart Contracts 卻誤用 PRNG 的行為。

文章裡提到的問題都是因為 PRNG 拿著可被預測的資訊當作 entropy source (e.g. seed),而且提出來的範例都是拿 block 本身或衍生的資訊 (像是 block 的 hash) 來用:

  • PRNGs using block variables as a source of entropy
  • PRNGs based on a blockhash of some past block
  • PRNGs based on a blockhash of a past block combined with a seed deemed private
  • PRNGs prone to front-running

然後列了大量的程式碼當例子,建議有需要接觸的人看過一次,或是有時間的人都值得看這些負面範例... XDDD

不過作者在文章裡面也給了一堆有問題的方法,像是從外部網站取得亂數之類的 XDDD

正確的方法是使用 CSPRNG (Cryptographically secure pseudorandom number generator),這是專門設計給密碼學用的 PRNG。

CSPRNG 有幾種方法可以取得:

  • 在大多數的程式語言內都有對應的 library 可以用,另外在比較近代的瀏覽器內 (如果問 IE 的話,是 11+),可以透過 RandomSource.getRandomValues() 得到。
  • 如果打算自己搞底層而需要直接取得 CSPRNG 的產出,在 Unix-like 的環境下可以透過 /dev/urandom 取得,在 Microsoft Windows 下則可以透過 CryptGenRandom 取得。

別學作者那邊奇怪方法啊 XDDD

在 Ubuntu 16.04 上面隨機改變無線網卡的 MAC address

在「Randomize your WiFi MAC address on Ubuntu 16.04」這邊看到作者在介紹如何在 Ubuntu 上藉由改變無線網卡的 MAC address 保護自己的隱私:

Your device’s MAC address can be used to track you across the WiFi networks you connect to. That data can be shared and sold, and often identifies you as an individual. It’s possible to limit this tracking by using pseudo-random MAC addresses.

成果像是這樣:

主要應該是給 Ubuntu 的筆電使用者用...

Etsy 介紹的 Cache Smearing

Etsy 的 engineering blog 上提到了他們怎麼設計 cache 機制:「How Etsy caches: hashing, Ketama, and cache smearing」。

使用 consistent hash 已經是基本款了,文章裡花了一些篇幅介紹為什麼要用 consistent hash。

後半段則是有了 consistent hash 後會遇到的問題,也就是講 hot key 怎麼處理:有些資料非常熱 (常常被存取),就算用 consistent hash 也還是有可能搞爆單一機器。

他們做了幾件事情,第一件事情是設計 cache smearing 機制,把單一資料加上 random key,使得不同的 key 會打散到不同的機器上:

Let’s take an example of a hot key popular_user_data. This key is read often (since the user is popular) and is hashed to pool member 3. Cache smearing appends a random number in a small range (say, [0, 8)) to the key before each read or write. For instance, successive reads might look up popular_user_data3, popular_user_data1, and popular_user_data6. Because the keys are different, they will be hashed to different hosts. One or more may still be on pool member 3, but not all of them will be, sharing the load among the pool.

第二件事情則是監控哪些 key 比較熱門:

We’ve seen this problem many times over the years, using mctop, memkeys, and our distributed tracing tools to track down hot keys.

第三件事情是維護 hot key 的清單 (不是每個 key 都會上 cache smearing):

We manually add cache smearing to only our hottest keys, leaving keys that are less read-heavy efficiently stored on a single host.

是個當規模大到單一 hot key 會讓單台伺服器撐不住時的 workaround...

The DUHK Attack:因為亂數產生器的問題而造成的安全漏洞

Bruce Schneier 那邊看到的:「Attack on Old ANSI Random Number Generator」,攻擊的網站在「The DUHK Attack」,論文在「Practical state recovery attacks against legacy RNG implementations (PDF)」。

攻擊的對象是 ANSI X9.31 Random Number Generator:

DUHK (Don't Use Hard-coded Keys) is a vulnerability that affects devices using the ANSI X9.31 Random Number Generator (RNG) in conjunction with a hard-coded seed key.

然後攻擊的對象是 FortinetFortiOS

Traffic from any VPN using FortiOS 4.3.0 to FortiOS 4.3.18 can be decrypted by a passive network adversary who can observe the encrypted handshake traffic.

如果照說明的只到 4.3.18,那麼去年 11 月更新的 4.3.19 (參考「FortiOS 4.3.19 Release Notes」) 應該是修正了?不過裡面沒翻到類似的資料,是剛好把 RNG 換掉了嗎?

Amazon 的多變數最佳化

在「An efficient bandit algorithm for real-time multivariate optimization」這邊提到了 Amazon 不是走傳統的 A/B testing,而是同時進行多變數的最佳化:

Consider the problem of trying to find a near-optimal version of a promotional message such as this one, which has 5 variable parts and 48 different combinations in total.

在這樣的測試數量下,作者預估需要 66 天才能夠得到有效的結果,而這也表示當變數更多的時候問題就更大了:

Based on the Amazon success rate and traffic size, the authors calculated it would take 66 days to conduct a 48 treatment randomized experiment. Often this isn’t practical.

也就是開頭提到的,如何一個禮拜就提昇 21% conversion rate:

Aka, “How Amazon improved conversion by 21% in a single week!”

作者也提到了這個方法其實打臉了他先前提到的另外一篇論文,在 2014 年提出測試應該要盡可能簡單 XDDD:

Yesterday we saw the hard-won wisdom on display in ‘seven myths‘ recommending that experiments be kept simple and only test one thing at a time, otherwise interpreting the results can get really complicated.

只能說狀況愈來愈複雜,導致需要新方法解決問題。而且這些電商會遇到在測試時不同的 factor 之間有可能會有相依性 (也就是說這些 factor 不是 i.i.d.),你用本來的方式反而會測不出來。

Swap 對 InnoDB 的影響

Percona 的老大拿 5.7 版做實驗,確認 swap 對 InnoDB 的影響:「The Impact of Swapping on MySQL Performance」。

測試的機器是 32GB RAM,作業系統 (以及 swap) 裝在已經有點年紀的 Intel 520 SSD 上,而 MySQL 則是裝在 Intel 750 NVMe 上。透過對 innodb_buffer_pool 的調整來看情況。

可以看到設為 24GB (記憶體 75% 的量) 時很穩定的在 44K QPS 與 3.5ms (95%):

This gives us about 44K QPS. The 95% query response time (reported by sysbench) is about 3.5ms.

而當設成 32GB 的時候開始可以觀察到 swap i/o,掉到 20K QPS 與 9ms (95%):

We can see that performance stabilizes after a bit at around 20K QPS, with some 380MB/sec disk IO and 125MB/sec swap IO. The 95% query response time has grown to around 9ms.

當拉到 48GB 的時候就更掉更多,6K QPS 與 35ms (95%):

Now we have around 6K QPS. Disk IO has dropped to 250MB/sec, and swap IO is up to 190MB/sec. The 95% query response time is around 35ms.

作者發現掉的比率沒有想像中大:

When I started, I expected severe performance drop even with very minor swapping. I surprised myself by getting swap activity to more than 100MB/sec, with performance “only” halved.

這邊測試用的是 SSD,如果是傳統用磁頭的硬碟,對 random access 應該會很敏感而掉更多:

This assumes your swap space is on an SSD, of course! SSDs handle random IO (which is what paging activity usually is) much better than HDDs.

基本上還是要避免碰到 swap 啦,另外 comment 的地方剛好有提到前陣子在猜測的 best practice,測試時的 vm.swappiness 是設成 1,這應該是作者的 best practice:

Swappiness was set to 1 in this case. I was not expecting this to cause significant impact as swapping is caused by genuine (intended) missconfiguration with more memory required than available.

InnoDB 的 buffer pool preload 功能

Percona 的人討論了 InnoDB 提供的 buffer pool preload 功能:「Using the InnoDB Buffer Pool Pre-Load Feature in MySQL 5.7」。

就如同他所講的,因為硬體設備的進步 (主要是 SSD 的興起),而導致 preload 的需求已經沒以前重要了:

Frankly, time has reduced the need for this feature. Five years ago, we would typically store databases on spinning disks. These disks often took quite a long time to warm up with normal database workloads, which could lead to many hours of poor performance after a restart. With the rise of SSDs, warm up happens faster and reduces the penalty from not having data in the buffer pool.

由於 SSD 的 random read 很快,反而可以直接推上線讓他邊跑服務邊 warm up。不過相對的,傳統硬碟的 InnoDB database 還是可以規劃需求,畢竟 random read 還是痛點...

Libgcrypt 與 GnuPG 的安全性問題

在「Security fixes for Libgcrypt and GnuPG 1.4 [CVE-2016-6316]」這邊看到這個歷史悠久的 bug:

Felix Dörre and Vladimir Klebanov from the Karlsruhe Institute of Technology found a bug in the mixing functions of Libgcrypt's random number generator: An attacker who obtains 4640 bits from the RNG can trivially predict the next 160 bits of output. This bug exists since 1998 in all GnuPG and Libgcrypt versions.

就這樣的行為,對於自己用的機器應該是還好... 不過得到 4640 bits 後就可以預測接下來的 160 bits,這個 RNG 有點囧 @_@

官方目前是認為應該還好:

A first analysis on the impact of this bug in GnuPG shows that existing RSA keys are not weakened. For DSA and Elgamal keys it is also unlikely that the private key can be predicted from other public information. This needs more research and I would suggest _not to_ overhasty revoke keys.

不過如果你有絕對的安全需求的話還是可以考慮 revoke 再重新生一把...

Archives