Netflix 的 BetterTLS,推廣 CA 的 Name Constraints

Netflix 因為想用 Name Constraints,所以決定自己跳出來推廣了:「BetterTLS - A Name Constraints test suite for HTTPS clients」。

就是在 CA 上可以綁定條件,只允許哪些 domain 可以被發放:

網站在「BetterTLS: Name Constraints」這邊可以看。

利用 Side-channel 資訊判斷被 HTTPS 保護的 Netflix 影片資訊

在「Netflix found to leak information on HTTPS-protected videos」這篇看到了研究員透過 VBR 所透露出的 side channel 資訊,成功的取得了被 HTTPS 保護的 Netflix 影片資訊。這對於美國的 ISP 是個大利多 (加上之前通過的法案),但對於個人隱私則是嚴重的打擊。

這項研究的準確率非常高:

To support our analysis, we created a fingerprint database comprised of 42,027 Netflix videos. Given this collection of fingerprints, we show that our system can differentiate between videos with greater than 99.99% accuracy. Moreover, when tested against 200 random 20-minute video streams, our system identified 99.5% of the videos with the majority of the identifications occurring less than two and a half minutes into the video stream.

而且他們居然是用這樣的單機分析:

null

苦啊...

所以 Netflix 也往 Google Cloud Platform 嘗試了...

看到「Netflix Security Monkey on Google Cloud Platform」這則消息,看起來 Netflix 也往 Google Cloud Platform 在嘗試了。

Netflix Security Monkey 本來是對 AWS 分析,現在則是開始支援 GCP (雖然還在 beta):

Security Monkey monitors policy changes and alerts on insecure configurations in an AWS account. While Security Monkey’s main purpose is security, it also proves a useful tool for tracking down potential problems as it is essentially a change tracking system.

YouTube 開始與內容商合作,推出 YouTube TV 服務

YouTube 推出 YouTube TV 服務:「Finally, live TV made for you」。

直接打了幾個大頻道,尤其是運動類的頻道:

Live TV streaming from ABC, CBS, FOX, NBC, ESPN, regional sports networks and dozens of popular cable networks.

比較完整的頻道資訊可以參考這張:

可以錄影:

A cloud DVR, with no storage limits.

跨電腦、平板與手機:

A service that works great on all your screens.

價錢也只有目前常態性的一半 (相較於買比較完整的 cable 方案),USD$35/month,而且不用簽長期合約:

Half the cost of cable with zero commitments. A YouTube TV membership is only $35 a month and there are no commitments—you can cancel anytime.

這的確有殺傷性了... 加上 YouTube 的需求所累積的頻寬,應該會比起 Netflix 當時遇到的問題少?

服務當掉後的偵測

CNN 這篇「Netflix goes down. Twitter blows up」提到了昨天 Netflix 當了好幾個小時的情況。裡面提到了 Down Detector 這個服務:

Downdetector -- which monitors outage complaints online -- reported more than 13,000 posts from users all over the world Saturday afternoon.

到 Down Detector 網站上看,這個服務有一部份是從 social network 上挖資料:

Downdetector collects status reports from a series of sources. Through a realtime analysis of this data, our system is able to automatically determine outages and service interruptions at a very early stage. One of the sources that we analyse are reports on Twitter.

甚至還可以挖出是全域性的還是區域性的 outage...

Netflix 的 Open Connect 架構

在「研究人员绘制Netflix CDN服务器位置图」這邊看到在討論 NetflixOpen Connect 架構,也就是 Netflix 自己的 CDN 架構。引用的論文是「Open Connect Everywhere: A Glimpse at the Internet Ecosystem through the Lens of the Netflix CDN」這篇。

翻了一下論文,看起來還是頗傳統的方法,從一堆伺服器去 query 找出來:

Indeed, Netflix server domain names are too specific, i.e., including details on the physical location of the server. This makes it unlikely that redirection takes place based on these domain names. Additionally, we ran a distributed DNS lookup campaign from Planetlab nodes to verify our assumption that Netflix DNS names always resolve to a single IP address, independently of the client location.

另外也有用窮舉法去查:

We implemented a crawler, which generated DNS names following the pattern outlined above and attempted to resolve those generated domains names to IP addresses. The crawler was fed with lists of known countries, airport codes and ISPs. We curated these lists manually, consulting publicly available lists of countries and airport codes. For the ISPs, we conducted an extensive search on the Internet, including but not limited to the Netflix ISP Speed Index, Wikipedia, various ISP rankings or recommendation lists and discussions in web forums.

所以總共列了 243 個國家地區,3468 個機場編號 (IATA) 與 1728 個 ISP 去搜尋:

In total, our curated lists contained 243 countries, 3,468 airport codes and 1,728 ISP names. In total we tested 16 different network connection types.

不過也有發現到錯字:

The DNS names actually include a typo, the Carrasco International Airport in Uruguay is abbreviated MDV instead of MVD. We used the ISP names found to verify that what was really meant is the airport in Uruguay.

然後找出了:

不過事實上因為 Netflix 已經規劃全面走 HTTPS:「Netflix 對 sendfile() 在 TLS 情況下的加速」、「Protecting Netflix Viewing Privacy at Scale」,所以在 Certificate Transparency%.nflxvideo.net 會有完整的列表,看起來遠超過這篇論文所給的數字:

gslin@home [~/tmp] [14:55/W5] grep isp.nflxvideo nflxvideo.html | sort | uniq | wc
   1311    1311   65827
gslin@home [~/tmp] [14:55/W5] grep ix.nflxvideo nflxvideo.html | sort | uniq | wc
     77      77    3230

來寫信給論文作者提看看好了...

Netflix 開發的 Delayed Queue

原來這個叫做 Delayed Queue,難怪之前用其他關鍵字都找不到什麼資料... (就不講其他關鍵字了 XD)

Netflix 發表了他們自己所開發的 Delayed Queue:「Distributed delay queues based on Dynomite」。

本來的架構是用 Cassandra + Zookeeper 來做:

Traditionally, we have been using a Cassandra based queue recipe along with Zookeeper for distributed locks, since Cassandra is the de facto storage engine at Netflix.

但可以馬上想到不少問題,就如同 Netflix 提到的:

Using Cassandra for queue like data structure is a known anti-pattern, also using a global lock on queue while polling, limits the amount of concurrency on the consumer side as the lock ensures only one consumer can poll from the queue at a time.

所以就改放到 Netflix 另外開發的 Dynamite 上:

Dynomite, inspired by Dynamo whitepaper, is a thin, distributed dynamo layer for different storage engines and protocols. Currently these include Redis and Memcached. Dynomite supports multi-datacenter replication and is designed for high availability.

後端是 RedisMemcached 的系統,可以對抗整個機房從 internet 上消失的狀態。

在設計上則是「保證會跑一次」,也就是有可能會有多次的情況,用 Dyno Queues 系統的人必需要考慮進去:

4. At-least-once delivery semantics

雖然整篇講的頗輕鬆,但實際看起來還是很厚重... 暫時還是不會用吧 :o

Netflix 對 sendfile() 在 TLS 情況下的加速

Netflix 對於寫了一篇關於隱私保護的技術細節:「Protecting Netflix Viewing Privacy at Scale」。

其中講到 2012 年的 Netflix Open Connect 中的 Open Connect Appliance (OCA,放伺服器到 ISP 機房的計畫) 只有單台伺服器 8Gbps,到現在 2016 可以達到 90Gbps:

As we mentioned in a recent company blog post, since the beginning of the Open Connect program we have significantly increased the efficiency of our OCAs - from delivering 8 Gbps of throughput from a single server in 2012 to over 90 Gbps from a single server in 2016.

早期的 Netflix 走 sendfile() 將影片丟出去,這在 kernel space 處理,所以很有效率:

當影片本身改走 HTTPS (TLS) 時,其中一個遇到的效能問題是導致 sendfile() 無法使用,而必須在 userland space 加密後改走回傳統的 write() 架構,這對於效能影響很大:

所以他們就讓 kernel 支援 AES 系列加密 (包括 AES-GCM 與 AES-CBC),效能的提昇大約是 30%:

Our changes in both the BoringSSL and ISA-L test situations significantly increased both CPU utilization and bandwidth over baseline - increasing performance by up to 30%, depending on the OCA hardware version.

文章開頭也有提到選 AES-GCM 與 AES-CBC 的一些來龍去脈,主要是 AES-GCM 的安全強度比較好,另外考慮到舊的 client 不支援 AES-GCM 時會使用 AES-CBC:

We evaluated available and applicable ciphers and decided to primarily use the Advanced Encryption Standard (AES) cipher in Galois/Counter Mode (GCM), available starting in TLS 1.2. We chose AES-CGM over the Cipher Block Chaining (CBC) method, which comes at a higher computational cost. The AES-GCM cipher algorithm encrypts and authenticates the message simultaneously - as opposed to AES-CBC, which requires an additional pass over the data to generate keyed-hash message authentication code (HMAC). CBC can still be used as a fallback for clients that cannot support the preferred method.

另外 OCA 機器本身也都夠新,支援 AES-NI 指令集,效能上不是太大的問題:

All revisions of Open Connect Appliances also have Intel CPUs that support AES-NI, the extension to the x86 instruction set designed to improve encryption and decryption performance. We needed to determine the best implementation of AES-GCM with the AES-NI instruction set, so we investigated alternatives to OpenSSL, including BoringSSL and the Intel Intelligent Storage Acceleration Library (ISA-L).

不過在「Netflix Open Connect Appliance Deployment Guide」(26 July 2016 版) 這份文件裡看起來還是用多條 10Gbps 透過 LACP 接上去:

You must be able to provision 2-4 x 10 Gbps ethernet ports in a LACP LAG per OCA. The exact quantity depends on the OCA type.

可能是下一版準備要上 40Gbps 或 100Gbps 的準備...?

Netflix 把金流相關的系統轉移到 AWS 上跑 MySQL 的故事...

這次要提的是「Netflix Billing Migration to AWS」、「Netflix Billing Migration to AWS - Part II」與「Netflix Billing Migration to AWS - Part III」這三篇。

Netflix 先前的金流相關系統跑的是 Oracle 的資料庫:

然後換成 MySQL

系統上是採用 DRBD,然後底層是 5 個 4TB 的 EBS 組成的 RAID 0,跑 LVM

High performance with respect to reads and writes was achieved by using RAID0 with EBS provisioned IOPS volumes. To get more throughput per volume, 5 volumes of 4TB each were used, instead of 1 big volume. This was to facilitate faster snapshots and restores.

LVM to manage two Logical Volume’s (DB and DRBD Metadata) within single Volume Group.

可以看到裡面用的都是很經過時間考驗的技術,像是 DRBD、標準的 Replication 架構...

Netflix 評估影片品質的方法

Netflix 在發了一篇很長的文章,說明怎麼評估 video quality:「Toward A Practical Perceptual Video Quality Metric」,文章雖然有點長,但其實還蠻好懂的...

講的白話一點,Netflix 想要做各種壓縮方式的改善,但在超大的量的情況下 (scale) 缺乏自動化打分數的機制:

All of the challenging work described above hinges on one fundamental premise: that we can accurately and efficiently measure the perceptual quality of a video stream at scale.

如果先不考慮 scale 問題,影片的評估方式有人工處理以及常見的計算方法 (像是 MSEPSNRSSIM):

Traditionally, in video codec development and research, two methods have been extensively used to evaluate video quality: 1) Visual subjective testing and 2) Calculation of simple metrics such as PSNR, or more recently, SSIM.

前者因為牽涉到人工,所以不 scale,而後者跟「人的觀感」還是不夠正相關:

Without doubt, manual visual inspection is operationally and economically infeasible for the throughput of our production, A/B test monitoring and encoding research experiments.

Although researchers and engineers in the field are well-aware that PSNR does not consistently reflect human perception, it remains the de facto standard for codec comparisons and codec standardization work.

Netflix 的作法其實很簡單:(但是每一步都很仔細)

  • 首先先把影片依照手上有的 metadata 歸類,然後再挑出代表性的剪輯,並且產生不同 bitrate 的檔案。
  • 用人工對這些剪輯評分。
  • 用機器產生各種既有計算方法的分數 (PSNR、SSIM、...)。
  • 用數學方法把人工的與機器算的分數建立 model。
  • 然後對於未知的影片先寄算出既有方法的分數 (PSNR、SSIM、...),然後套用 model 推估人的觀感。

沒什麼特別發明出來的演算法,只是苦工 XDDD