Home » 2015 » June (Page 3)

Etsy 用 SSD 的故事

EtsyLaurie Denness 對於 Etsy 使用各種品牌 SSD 的情況給出了他的經歷:「SSDs: A gift and a curse」。

重點在於開頭說的:

SSD firmware is buggy

可以看到當 SSD 配上 RAID controller 的時候,常常會需要找問題... (而且很難找)

Intel 的評價很不錯:

Okay, bad start, we’ve actually had no issues with Intel. This seems to be common across other companies we’ve spoken to.

OCZ 倒了,被 Toshiba 收購,而且 S.M.A.R.T. 資訊很差,很難預測什麼時候會掛掉 (有助於提前替換):

However, they had poor SMART info (none) so predicting failures was hard.

HP 是個大黑盒:

Unfortunately, HP have proprietary RAID controllers, and they don’t support SMART. Or rather, they refuse to talk to non-HP drives using off the shelf technology, they have their own methods.

Samsung 的評價不錯,C/P 值很高,而且有 S.M.A.R.T.:

Samsung saved the day and picked up from OCZ with a ludicrously cheap 960GB offering, the 840 EVO. A consumer drive, so very limited warranty, but for the price (~$400-500) you got great IOPS and they were reliable. They had better SMART info, and seemed to play nicely with our hardware.

不過 BB6Q 版的韌體搞爆了效能,雖然最後修好了:「Samsung Releases Firmware Update to Fix the SSD 840 EVO Read Performance Bug」。

LiteOn 則是掛在 GC 上 (RAID 裡同時掛掉兩顆以上):

The SSDs were having extended garbage collection periods, exacerbated by a smaller amount of SSDs with higher IO, in RAID6. This caused the controller to kick the drive out of the array… and unfortunately due to the write levelling across the drives, at least two of them were garbage collecting at the same time, destroying the array integrity.

不過後來 Dell 與 LiteOn 分別就 RAID controller 與 SSD 本身都跳下去修正,最後還是解決了:

Dell and LiteOn together identified and fixed weaknesses in their RAID controller, the backplane and the SSD firmware.

算是經驗分享,在 SSD 硬碟成熟的過程中間必經的道路 XD

超慢的 Facebook...

今天的 Facebook 慢到爆炸,看起來像是台灣機房的問題,這邊先提供 workaround:

把 DNS 改用 8.8.8.8,看起來 DNS 會解去洛杉磯的機房,速度就會正常多了。

接下來列出看到的結果,靜態的頁面 (拿 /robots.txt 測試) 沒問題:

Server Software:        
Server Hostname:        www.facebook.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-ECDSA-AES128-GCM-SHA256,256,128

Document Path:          /robots.txt
Document Length:        4159 bytes

Concurrency Level:      1
Time taken for tests:   0.792 seconds
Complete requests:      20
Failed requests:        0
Total transferred:      99000 bytes
HTML transferred:       83180 bytes
Requests per second:    25.24 [#/sec] (mean)
Time per request:       39.612 [ms] (mean)
Time per request:       39.612 [ms] (mean, across all concurrent requests)
Transfer rate:          122.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       28   30   0.6     30      31
Processing:     9   10   0.3     10      10
Waiting:        9    9   0.2      9      10
Total:         38   39   0.7     39      41

Percentage of the requests served within a certain time (ms)
  50%     39
  66%     40
  75%     40
  80%     40
  90%     40
  95%     41
  98%     41
  99%     41
 100%     41 (longest request)

但動態的頁面就慢到炸:

Server Software:        
Server Hostname:        www.facebook.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-ECDSA-AES128-GCM-SHA256,256,128

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      1
Time taken for tests:   20.294 seconds
Complete requests:      20
Failed requests:        0
Non-2xx responses:      20
Total transferred:      6700 bytes
HTML transferred:       0 bytes
Requests per second:    0.99 [#/sec] (mean)
Time per request:       1014.690 [ms] (mean)
Time per request:       1014.690 [ms] (mean, across all concurrent requests)
Transfer rate:          0.32 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       28   30   0.9     29      32
Processing:   151  985 1144.5    567    4939
Waiting:      151  985 1144.5    567    4938
Total:        180 1014 1145.1    597    4971
WARNING: The median and mean for the initial connection time are not within a normal deviation
        These results are probably not that reliable.

Percentage of the requests served within a certain time (ms)
  50%    597
  66%   1226
  75%   1279
  80%   1667
  90%   2267
  95%   4971
  98%   4971
  99%   4971
 100%   4971 (longest request)

Facebook 推出 Facebook Lite

Facebook 推出 Facebook Lite (Android):「Announcing Facebook Lite」。

與完整版的差異在於對網路的使用量會更低,以及使用的資源會更少:

Today we’re introducing Facebook Lite, a new version of Facebook for Android that uses less data and works well across all network conditions.

Facebook Lite is less than 1MB so it is fast to install and quick to load. It includes Facebook’s core experiences like News Feed, status updates, photos, notifications and more.

之前就先有先推出了,不過現在開放給更多地區使用:

Starting today, we are rolling out Facebook Lite in countries across Asia, and over the coming weeks Facebook Lite will be available in parts of Latin America, Africa and Europe.

Let's Encrypt 建立 Root Certificate 與 Intermediate Certificate

Let's Encrypt 的 Root Certificate 與 Intermediate Certificate 建出來了:「Let's Encrypt Root and Intermediate Certificates」。

Intermediate Certificate 除了會讓自己的 Root Certificate 簽名外,也會讓 IdenTrust 的 DST Root CA X3 簽 (目前各大瀏覽器與 SSL library 都有支援)。

目前是 RSA key,之後會生出 ECDSA key:

All ISRG keys are currently RSA keys. We are planning to generate ECDSA keys later this year.

Galera Cluster (Percona XtraDB Cluster) 對寫入熱點的處理方式

Percona 的「Optimizing Percona XtraDB Cluster for write hotspots」這篇說明了在 Galera Cluster 上發生多個 transaction 衝突時的處理方式。

不過解法是早就已經知道的方法,也就是把寫入都丟到同一台上處理:

So what are your options with PXC when the workload has write hotspots? The most obvious one is to write on a single node: then you will have the same locking model as with a standalone InnoDB server. Performance will not be as good as with a standalone server as you will have to pay for synchronous replication, but you will avoid the very expensive write conflicts.

講解的文章,對之後 troubleshooting 與 performance tuning 有幫助...

對 GitHub 的 Public Key 分析

Hacker News Daily 上看到有人針對 GitHub 上的 Public Key 分析:「Auditing GitHub users’ SSH key quality」。

這個分析主要用的是 GitHub 的 .keys 功能取得:

A little known feature of GitHub is the ability to look at the public SSH keys that other users have set to be authorised on their account (for example https://github.com/torvalds.keys)

作者從去年 2014 年 12 月 27 日跑到今年 2015 年 1 月 9 日,共取得了 1376262 把 public key,然後就針對這些 key 分析。

「The Debian bug bites again」這一段講的是 CVE-2008-0166 (可以參考「該換 ssh keys 了」這篇的說明),發現即使是七年後的現在,也還是有問題。

另外還有 256 bits 與 512 bits 的 RSA key,作者自己建立了一把測試,用 i5-2400 只要 25 分鐘就可以解出 256 bits 的 private key:

This risk isn’t only real if someone had gathered together top of the line mathematicians or supercomputers worth of power, the 256 bit key I factored was factored on a i5-2400 in 25 mins.

NSA 應該都摸透了...

作者提供的時間線:

[27th December 2014] Key Crawl Started
[28th Feb 2015] Disclosed low bits issue to a peer at GitHub
[1st March 2015] Discovered the weak Debian Keys (and disclosed)
[5th May 2015] Debian keys revoked, emails sent out
[1st June 2015] Weak and low quality keys revoked

HTTP/1.1 時代的 Best Practice 變成 HTTP/2 的問題

Velocity 2015 上的「HTTP/2 is here, let's optimize!」提到了很多關於 HTTP/1.1 時代所採用的 Best Practice (或者說,workaround) 變成了 HTTP/2 的問題。

這張表整理了各種技巧在 HTTP/1.1 與 HTTP/2 的差異:

在 HTTP/2 因為有了 multiplexing 機制,用了 Apply domain sharding 後反而增加 DNS query 以及開新的連線所需要的 handshake 時間。

而 Concatenate resources 也算是 workaround 的一種,不同等級的合併會有不同的 trade-off。全站合併 assets 可以讓常逛的使用者下載的量降到最低,但會讓第一次讀取的使用者花比較多時間下載。如果是單頁合併 assets,則剛好反過來:第一次讀取的使用者較快,但常逛的使用者會下載重複的內容。

最後的 Inline resources,作者則是提出利用 HTTP/2 提供的 server push 機制來改善,不過沒看懂...

有些 workaround 總算可以拋開了...

Archives