InnoDB redo log 大小對效能的影響

在「Benchmark(et)ing with InnoDB redo log size」這邊看到在討論 InnoDB redo log 的大小對效能的影響 (也就是 innodb_log_file_sizeinnodb_log_files_in_group)。

開頭就有先提到重點,在新版 MySQL 裡,幾乎所有的情況比較大的 redo log 有比較好的效能 (平均值):

tl;dr - conclusions specific to my test

  1. A larger redo log improves throughput
  2. A larger redo log helps more with slower storage than with faster storage because page writeback is more of a bottleneck with slower storage and a larger redo log reduces writeback.
  3. A larger redo log can help more when the working set is cached because there are no stalls from storage reads and storage writes are more likely to be a bottleneck.
  4. InnoDB in MySQL 5.7.17 is much faster than 5.6.35 in all cases except IO-bound + fast SSD

可以看出來平均效能的提昇很顯著,不管是增加 redo log 大小還是升級到 5.7:

但作者也遇到了奇怪的效能問題。雖然平均效能提昇得很顯著,但隨著加入資料的增加,效能的 degradation 其實很嚴重,在原來的網頁上可以看到這些資訊。

The results above show average throughput and that hides a lot of interesting behavior. We expect throughput over time to not suffer from variance -- for both InnoDB and for MyRocks. For many of the results below there is a lot of variance (jitter).

所以也許現階段先加大就好 (至少寫入的效能會提昇),不需要把這個特性當作升級 MySQL 的理由。

Amazon DynamoDB Accelerator (DAX)

DynamoDB 推出的新架構,在系統上幫忙處理 cache:「Amazon DynamoDB Accelerator (DAX) – In-Memory Caching for Read-Intensive Workloads」。

DAX 跟現有的 DynamoDB API 相容:

DAX is a fully managed caching service that sits (logically) in front of your DynamoDB tables. It operates in write-through mode, and is API-compatible with DynamoDB.

因為 cache 的緣故,會是 eventually-consistent 架構:

Responses are returned from the cache in microseconds, making DAX a great fit for eventually-consistent read-intensive workloads.

然後是 r3 系列的機器組成的,限制在十台 (冒出大大的問號):

Each DAX cluster can contain 1 to 10 nodes; you can add nodes in order to increase overall read throughput. The cache size (also known as the working set) is based on the node size (dax.r3.large to dax.r3.8xlarge) that you choose when you create the cluster. Clusters run within a VPC, with nodes spread across Availability Zones.

不是很清楚這樣的好處 (比起自己用 memcached 或是其他類似的 cache 架構),也許過幾天想通了會開竅... :o

Cloudbleed:Cloudflare 這次的安全問題

Cloudflare 把完整的時間軸與影響範圍都列出來了:「Incident report on memory leak caused by Cloudflare parser bug」。

出自於 2/18 時 GoogleTavis Ormandy 直接在 Twitter 上找 Cloudflare 的人:

Google 的 Project Zero 上的資料:「cloudflare: Cloudflare Reverse Proxies are Dumping Uninitialized Memory」。

起因在於 bug 造成有時候會送出不應該送的東西,可能包含了敏感資料:

It turned out that in some unusual circumstances, which I’ll detail below, our edge servers were running past the end of a buffer and returning memory that contained private information such as HTTP cookies, authentication tokens, HTTP POST bodies, and other sensitive data.

不過這邊不包括 SSL 的 key,主要是因為隔離開了:

For the avoidance of doubt, Cloudflare customer SSL private keys were not leaked. Cloudflare has always terminated SSL connections through an isolated instance of NGINX that was not affected by this bug.

不過由於這些敏感資料甚至還被 Google 收進 search engine,算是相當的嚴重,所以不只是 Cloudflare 得修好這個問題,還得跟眾多的 search engine 合作將這些資料移除:

Because of the seriousness of such a bug, a cross-functional team from software engineering, infosec and operations formed in San Francisco and London to fully understand the underlying cause, to understand the effect of the memory leakage, and to work with Google and other search engines to remove any cached HTTP responses.

bug 影響的時間從 2016/09/22 開始:

2016-09-22 Automatic HTTP Rewrites enabled
2017-01-30 Server-Side Excludes migrated to new parser
2017-02-13 Email Obfuscation partially migrated to new parser
2017-02-18 Google reports problem to Cloudflare and leak is stopped

而以 2/13 到 2/18 的流量反推估算,大約是 0.00003% 的 request 會可能產生這樣的問題:

The greatest period of impact was from February 13 and February 18 with around 1 in every 3,300,000 HTTP requests through Cloudflare potentially resulting in memory leakage (that’s about 0.00003% of requests).

不過不得不說 Tavis Ormandy 真的很硬,在沒有 source code 以及 Cloudflare 幫助的情況下直接打出可重製的步驟:

I worked with cloudflare over the weekend to help clean up where I could. I've verified that the original reproduction steps I sent cloudflare no longer work.

事發後完整的時間軸:

2017-02-18 0011 Tweet from Tavis Ormandy asking for Cloudflare contact information
2017-02-18 0032 Cloudflare receives details of bug from Google
2017-02-18 0040 Cross functional team assembles in San Francisco
2017-02-18 0119 Email Obfuscation disabled worldwide
2017-02-18 0122 London team joins
2017-02-18 0424 Automatic HTTPS Rewrites disabled worldwide
2017-02-18 0722 Patch implementing kill switch for cf-html parser deployed worldwide
2017-02-20 2159 SAFE_CHAR fix deployed globally
2017-02-21 1803 Automatic HTTPS Rewrites, Server-Side Excludes and Email Obfuscation re-enabled worldwide

另外在「List of Sites possibly affected by Cloudflare's #Cloudbleed HTTPS Traffic Leak」這邊有人整理出受影響的大站台有哪些 (小站台就沒列上去了)。

Amazon EC2 推出 I3 系列機器

Amazon EC2 推出使用 NVMe SSD 的機器,I3 系列:「Now Available – I3 Instances for Demanding, I/O Intensive Applications」。

以東京區的價錢來看,r4.16xlarge 與 i3.16xlarge 都是 64 vCPU 與 488GB RAM。不一樣的地方只有兩個:

  • 第一個是 r4 只有 195 vCPU,而 i3 有 200 vCPU,快了一些。
  • 第二個是 i3 多了 8 個 1900 NVMe SSD。

但價錢卻只差一些 ($5.12/hr 與 $5.856/hr),如果速度可以善用 SSD 的話,跟 r4.* 比起來其實頗超值的...

Linode 推出 $5/month 方案,DigitalOcean 推出 load balancer

LinodeDigitalOcean 這兩家有名的 VPS 都推出新的功能:「High-Memory Instances and $5 Linodes」、「Load Balancers: Simplifying High Availability」。

Linode 另外將 $10/month 方案的硬碟空間加大:

And finally, the existing Linode 2GB ($10/mo) plan is receiving a free storage upgrade from 24GiB to 30GiB.

本來對外的速度限制在 125Mbps max 拉到 1000Mbps:(本來的資料可以參考之前的頁面)

And finally finally, we’ve also increased the outbound network speed limit on all plans to be at minimum 1000 Mbits. Existing Linodes will need to reboot to pick up the new value, that’s it!

EC2 的 r4 系列機器開出來了...

Amazon EC2 的 r4.* 總算是開出來了:「Amazon EC2 R4 instances are now available in new regions」。

Amazon EC2 R4 instances are now available in the following regions: Asia Pacific (Tokyo), Asia Pacific (Singapore), South America (São Paulo), Asia Pacific (Seoul), Asia Pacific (Mumbai), Canada (Central), and EU (London).

r4.16xlarge (488GB) 算是補上中間的 r3.8xlarge (244GB) 與 x1.16xlarge (976GB) 中間的一塊洞了,不然之前得開 p2.8xlarge (488GB),但也不是每一區都有,而且用不到 GPU 就浪費了...

Google Chrome 55 的記憶體改善

前陣子 Google Chrome 55 出了,其中最讓人期待的是對記憶體的改善,有人整理了數據:「Chrome 55 uses ~30% less memory than Chrome 54」。

依照作者拿 weather.comreddit.com 測試,前者的記憶體省了 26%,後者省了 30%,都相當明顯。我自己在升到 55 後有明顯感覺到改善,尤其是重開 Chrome 時重新讀取頁面的速度也快了不少...

這些改善主要是出自於「Fall cleaning: Optimizing V8 memory consumption」這邊提到對 V8 engine 的改寫,我感覺到速度變快應該是記憶體用量降低使得 CPU cache rate 提高的關係吧...

Mark Callaghan 講最近的 MySQL 的行銷活動...

Mark Callaghan 這篇倒是沒提到什麼技術的東西,主要是講最近 MySQL 的兩大 conference,一個是 OracleOracle Open World,另外一個是 PerconaPercona Live Amsterdam 2016,然後用了 benchmarketing 這個酸酸的詞 XDDD:「Peak benchmarketing season for MySQL」。

裡面有些也很有趣的東西:

My joke is that each of these makes a different group happy: performance -> marketing, usability -> developers, manageability -> operations, availability -> end users, efficiency -> management.

另外提到了 RocksDB 建出來的 MyRocks 在 memory fit 時可能會比 InnoDB 還要好:

One last disclaimer. If you care about read-mostly/in-memory workloads then InnoDB is probably an excellent choice. MyRocks can still be faster than InnoDB for in-memory workloads. That is more likely when the bottleneck for InnoDB is page write-back performance. So write-heavy/in-memory can still be a winner for MyRocks.

這就有趣了,找個時間來測試看看...

Python 3.6 對 Dict 的改善

Python 3 的 Dict 將會有重大的改變:「[Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered」。

在 3.5 時:

Python 3.5.1 (default, Jun 20 2016, 14:48:22)
>>> def func(**kw): print(kw.keys())
...
>>> func(a=1, b=2, c=3, d=4, e=5)
dict_keys(['c', 'd', 'e', 'b', 'a'])   # random order

對上目前還在開發的 3.6:

Python 3.6.0a4+ (default:d43f819caea7, Sep  8 2016, 13:05:34)
>>> def func(**kw): print(kw.keys())
...
>>> func(a=1, b=2, c=3, d=4, e=5)
dict_keys(['a', 'b', 'c', 'd', 'e'])   # expected order

在「Compact and ordered dict」這邊可以看到記憶體的使用量降低:

It seems like the memory usage is between 20% and 25% smaller. Great job!

Memory usage, Python 3.5 => Python 3.6 on Linux x86_64:

./python -c 'import sys; print(sys.getsizeof({str(i):i for i in range(10)}))'

* 10 items: 480 B => 384 B (-20%)
* 100 items: 6240 B => 4720 B (-24%)
* 1000 items: 49248 B => 36984 B (-25%)

Note: the size is the the size of the container itself, not of keys nor values.

不過效能上似乎慢了一些:

3% slowdown in microbench is not surprising.
Compact dict introduces one additional indirection.

Instead, I've added freelist for most compact PyDictKeys.
So I think overall performance is almost same to before compact dict.

不過也有人提到應該拿 3.5 + patch 測,而不是直接拿 3.6 測:

There are a lot of other changes in interpreter core between 3.5 and 3.5 (such as new bytecode and optimized function calls). Could you compare the performance between the version just before adding new dict implementation and the version just after this?

看起來後續還在進行中...