Amazon Kinesis Streams 的 Video 版本:Amazon Kinesis Video Streams

這次 AWS 推出的 Amazon Kinesis Video Streams 在技術上看起來跟 Amazon Media Services 有不少重疊 (參考先前提到的文章「AWS Media Services 推出一卡車與影音相關的服務...」),但產品面上區隔開的服務:「Amazon Kinesis Video Streams – Serverless Video Ingestion and Storage for Vision-Enabled Apps」。

開頭介紹就有提到適合用在各種 IoT 裝置,用在一直有影像資料產生的設備上:

Cell phones, security cameras, baby monitors, drones, webcams, dashboard cameras, and even satellites can all generate high-intensity, high-quality video streams. Homes, offices, factories, cities, streets, and highways are now host to massive numbers of cameras.

像這張圖的所介紹的流程,以及可以保留天數的設計:

底層用了不少與 Amazon Media Services 相同的技術,但是包裝成不同的產品...

AWS Media Services 推出一卡車與影音相關的服務...

AWS 推出了一連串 AWS Elemental MediaOOXX 一連串影音相關的服務:「AWS Media Services – Process, Store, and Monetize Cloud-Based Video」。

但不是所有的服務都是相同的區域... 公告分別在:

不過這邊還是引用 Jeff Barr 文章裡的說明,可以看到從很源頭的 transencoding 到 DRM,以及 Live 格式,到後續的檔案儲存及後製 (像是上廣告) 都有:

AWS Elemental MediaConvert – File-based transcoding for OTT, broadcast, or archiving, with support for a long list of formats and codecs. Features include multi-channel audio, graphic overlays, closed captioning, and several DRM options.

AWS Elemental MediaLive – Live encoding to deliver video streams in real time to both televisions and multiscreen devices. Allows you to deploy highly reliable live channels in minutes, with full control over encoding parameters. It supports ad insertion, multi-channel audio, graphic overlays, and closed captioning.

AWS Elemental MediaPackage – Video origination and just-in-time packaging. Starting from a single input, produces output for multiple devices representing a long list of current and legacy formats. Supports multiple monetization models, time-shifted live streaming, ad insertion, DRM, and blackout management.

AWS Elemental MediaStore – Media-optimized storage that enables high performance and low latency applications such as live streaming, while taking advantage of the scale and durability of Amazon Simple Storage Service (S3).

AWS Elemental MediaTailor – Monetization service that supports ad serving and server-side ad insertion, a broad range of devices, transcoding, and accurate reporting of server-side and client-side ad insertion.

引個前同事的 tweet,先不說 Amazon SWF 的情況 (畢竟 Amazon SWF 還可以找到其他用途),倒是 Amazon Elastic Transcoder 很明顯要被淘汰掉了:

這種整個大包的東西是 AWS re:Invent 才有的能量,平常比較少看到...

Stream 對 .io 的感冒

Stream 的人寫了一篇「Why Stream Stopped Using .IO Domain Names for Production Traffic」表達他們對 .io 的感冒...

主要是因為 9/20 爛掉的情況不太妙。第一個是 .io 爛掉了兩個小時 (以月來算 SLA 就等於直接掉了 0.2% uptime,變成不到 99.8%),第二個是爛掉時 server 傳回的不是 SERVFAIL,而是 NXDOMAIN

The outage lasted for almost 2 hours, during which 1/5th of DNS queries for any .getstream.io record would fail.

他們的解法是改到 .com 上,畢竟影響的時候應該會修得比較快。另外文章裡也有延伸提到 Amazon Route 53 爛掉時要怎麼辦,如果他們真的決定要解決的話,應該是會拿出像「StackOverflow 對於多 DNS 商的同步方式...」或是「GitHub 也自己搞了一套管理多家 DNS 的程式...」的搞法吧。

不過這的確是當初選 .io 沒預料到的...

在 MOPCON 2017 的 Unconference「MySQL to NoSQL & Search Engine」

把投影片傳到 Speaker Deck 上了:「MySQL to NoSQL & Search Engine」。

這是在介紹 noplay/python-mysql-replication 這個軟體,我在示範時用的 python script 有增加 blocking 參數讓他保持一直讀取 MySQL replication stream:

from pymysqlreplication import BinLogStreamReader

mysql_settings = {'host': '127.0.0.1', 'port': 3306, 'user': 'root', 'passwd': ''}

stream = BinLogStreamReader(connection_settings = mysql_settings, server_id=100, blocking=True)

for binlogevent in stream:
    binlogevent.dump()

stream.close()

利用這樣的工具可以做很多事情,像是當 post 表格更新時自動更新 search engine,並且清空 memcached 內的資料。這可以避免使用 library 時有可能會漏掉忘記做 (因為有些程式不用 library 處理),可靠度比較高。

另外一方面 replication protocol 本身就有考慮重連的問題,重新接上時是可以從上一次處理完的資料繼續處理 (只要不要隔太久),這讓寫應用的人不需要用太複雜的方式確保他不會漏掉。

利用 Side-channel 資訊判斷被 HTTPS 保護的 Netflix 影片資訊

在「Netflix found to leak information on HTTPS-protected videos」這篇看到了研究員透過 VBR 所透露出的 side channel 資訊,成功的取得了被 HTTPS 保護的 Netflix 影片資訊。這對於美國的 ISP 是個大利多 (加上之前通過的法案),但對於個人隱私則是嚴重的打擊。

這項研究的準確率非常高:

To support our analysis, we created a fingerprint database comprised of 42,027 Netflix videos. Given this collection of fingerprints, we show that our system can differentiate between videos with greater than 99.99% accuracy. Moreover, when tested against 200 random 20-minute video streams, our system identified 99.5% of the videos with the majority of the identifications occurring less than two and a half minutes into the video stream.

而且他們居然是用這樣的單機分析:

null

苦啊...

收 Wikimedia (包括維基百科) 的 Recent Changes

所以有新的 streaming protocol 取代本來的 RCStream:「Get live updates to Wikimedia projects with EventStreams」。

這次新的 protocol 是走標準協定:

EventStreams is built on the w3c standard Server Sent Events (SSE). SSE is simply a streaming HTTP connection with event data in a particular text format. Client libraries, usually called EventSource, assist with building responsive tools, but because SSE is really just HTTP, you can use any HTTP client (even curl!) to consume it.

直接用瀏覽器打開也可以看到一直冒出來新的訊息...

Facebook 備份 MySQL 資料並且確認正確性的方法

Facebook 再多花了一些篇幅數對於 MySQL 資料備份以及確認正確性的方法:「Continuous MySQL backup validation: Restoring backups」。

首先是 Continuous Restore Tier (CRT) 這塊,可以看到他們在這塊很仰賴 HDFS 當作備份的第一層基地,包括了 Full logical backups (用 mysqldump)、Differential (diff) backups 以及 Binary log (binlog) backups (stream 進 HDFS)。

另外上了 GTID,對於後續的處理會比較方便:

All of our database servers also use global transaction IDs (GTIDs), which gives us another layer of control when replaying transactions from binlog backups.

在 CRT 這塊可以看到其實是拿現成的工具堆起來,不同單位會因為規模而有不同的作法。真正的重點反而在 ORC Restore Coordinator (ORC) 這塊,可以看到 Facebook 開發了大量的程式將回復這件事情自動化處理:

在收到回復的需求後,可以看到 Peon 會從 HDFS 拉資料出來,並且用 binlog replay 回去:

Peons contain all relevant logic for retrieving backups from HDFS, loading them into their local MySQL instance, and rolling them forward to a certain point in time by replaying binlogs. Each restore job a peon works on goes through these five stages[.]

也是因為 Facebook 對 MySQL 的用量大到需要自動化這些事情,才有這些東西...

OpenSSL 1.1.0

看到「OpenSSL 1.1.0 released」這篇得知大家期待已久的 OpenSSL 1.1.0 出了,在 1.1.0 的重要新功能中,對 ChaCha20 + Poly1305 的支援算是大家等很久的:

  • Support for ChaCha20 and Poly1305 added to libcrypto and libssl

由於 RC4 已經被證明不安全,OpenSSL 內變成沒有堪用的 stream cipher,這邊總算要補上來了...

另外兩個也頗有趣的:

  • Support for scrypt algorithm
  • Support for X25519

多了些東西...

Netflix 評估影片品質的方法

Netflix 在發了一篇很長的文章,說明怎麼評估 video quality:「Toward A Practical Perceptual Video Quality Metric」,文章雖然有點長,但其實還蠻好懂的...

講的白話一點,Netflix 想要做各種壓縮方式的改善,但在超大的量的情況下 (scale) 缺乏自動化打分數的機制:

All of the challenging work described above hinges on one fundamental premise: that we can accurately and efficiently measure the perceptual quality of a video stream at scale.

如果先不考慮 scale 問題,影片的評估方式有人工處理以及常見的計算方法 (像是 MSEPSNRSSIM):

Traditionally, in video codec development and research, two methods have been extensively used to evaluate video quality: 1) Visual subjective testing and 2) Calculation of simple metrics such as PSNR, or more recently, SSIM.

前者因為牽涉到人工,所以不 scale,而後者跟「人的觀感」還是不夠正相關:

Without doubt, manual visual inspection is operationally and economically infeasible for the throughput of our production, A/B test monitoring and encoding research experiments.

Although researchers and engineers in the field are well-aware that PSNR does not consistently reflect human perception, it remains the de facto standard for codec comparisons and codec standardization work.

Netflix 的作法其實很簡單:(但是每一步都很仔細)

  • 首先先把影片依照手上有的 metadata 歸類,然後再挑出代表性的剪輯,並且產生不同 bitrate 的檔案。
  • 用人工對這些剪輯評分。
  • 用機器產生各種既有計算方法的分數 (PSNR、SSIM、...)。
  • 用數學方法把人工的與機器算的分數建立 model。
  • 然後對於未知的影片先寄算出既有方法的分數 (PSNR、SSIM、...),然後套用 model 推估人的觀感。

沒什麼特別發明出來的演算法,只是苦工 XDDD

BGPmon 推出 BGP Stream 警告異常的 BGP 流量劫持

也是兩個禮拜前的新聞,在「OpenDNS BGP Stream Twitter Feed」這邊提到了 BGPmon 將會推出 BGP Stream 服務,將偵測到的 BGP 異常變化發到 Twitter 上。

其中 BGPmon 在幾個月前被 OpenDNS 併購 (2015 年 3 月),而 Cisco 則在上上個月底併購了 OpenDNS (2015 年 6 月)。而在過幾天的 DefCon 23 上將會透露更多細節。

前陣子 Hacking Team 洩漏的資料中就用到了 BGP hijack 來取回控制權:

That nugget that emerged from the 400 Gb of stolen Hacking Team data posted online where Italian law enforcement used Hacking Team’s Remote Control System monitoring software to regain control over a number IP addresses it was watching that were already infected with Hacking Team software by hijacking BGP routes in order to redirect traffic and regain control over a target’s machines.

除了示警外,另外一方面 BGP 上的簽名技術也愈來愈重要了,只是不知道最終會怎麼做...