AWS 推出 CloudWatch Metric Streams

AWS 推出了 CloudWatch Metric Streams,把 CloudWatch Metric 的資料往 Kinesis Data Firehose 裡面丟:「CloudWatch Metric Streams – Send AWS Metrics to Partners and to Your Apps in Real Time」。

其中一個賣點是即時性比用 API 去拉好很多:

In order to make it easier for AWS Partners and others to gain access to CloudWatch metrics faster and at scale, we are launching CloudWatch Metric Streams. Instead of polling (which can result in 5 to 10 minutes of latency), metrics are delivered to a Kinesis Data Firehose stream.

格式上可以是 JSON 或是 Open Telemetry

When you set up a stream you choose between the binary Open Telemetry 0.7 format, and the human-readable JSON format.

另外一個賣點是價位,每千次 $0.003:

Pricing – You pay $0.003 for every 1000 metric updates, and for any charges associated with the Kinesis Data Firehose. To learn more, check out the pricing page.

另外算一下 Kinesis Data Firehose 的價錢,是以資料量的大小計費,不過最小計價單位是 5KB (一筆應該是不會到),單價是 $0.029/GB (us-east-1) 或是 $0.037/GB (ap-southeast-1),算了一下跟 CloudWatch Metrics Streams 比起來只是零頭...

之前如果要自己拉出來的話是透過 API call 抓,每 1000 次是 USD$0.01,這個方法相較起來便宜不少,不過數量多的時候還是一筆費用 (而且有不少 metrics 是一分鐘更新一次)。

如果只是要備份起來或是跑分析的話,也許先前用 API 拉的作法可能還是比較好?一個小時拉一次對於備份與分析應該都很夠了,而 alarm 的機制還是掛在 CloudWatch 上。

這次產品的定位看起來是要把 ecosystem 做起來:

We designed this feature with the goal of making it easier & more efficient for AWS Partners including Datadog, Dynatrace, New Relic, Splunk, and Sumo Logic to get access to metrics so that the partners can build even better tools.

Amazon CloudWatch 支援縮放與拖拉調整時間區間

Amazon CloudWatch 的操作上支援 Zoom 與 Pan 了:「Amazon CloudWatch now supports two new chart visualization options in metrics and dashboards」。

Zoom 是改變時間的粒度:

You can use the CloudWatch console to graph metric data generated by AWS services and your applications. Now, you can zoom into a shorter time period such as one minute or five minutes while viewing the metric graph at a longer interval.

Pan 則是維持一樣的粒度,但改變開始與結束的時間:

Once zoomed, you can also pan the metric graph across your selected interval, but at a zoomed detail level.

變得更容易操作,而不用滑鼠移過去修改...

AWS 的 General Purpose SSD (gp2) 可以看到 burst I/O 的 credit 數字了

AWS 宣佈把 gp2 的 I/O burst credit 數字給量化了:「New – Burst Balance Metric for EC2’s General Purpose SSD (gp2) Volumes」。

gp2 最小的也可以衝到 3000 IOPS,另外可以累積 5.4M credits:

Each volume can accumulate up to 5.4 million credits, and they can be spent at up to 3,000 per second per volume.

算了一下,1GB 的空間一個小時可以累積 10,800 IOPS,如果切 10GB 的系統碟,大約 50 個小時就會滿。如果 100GB 的話就是 5 個小時了,其實對於真的超級大量持續 I/O 的應用還是要考慮用 Provisioned IOPS SSD (io1)。

不過明顯的好處是可以建立 alarm,當機器的 burst I/O credit 快用完的時候可以叫一叫,這樣讓人可以評估下一步:

另外也可以藉由這個數字來評估是要加大空間以換取 IOPS,或是換到有保障的 Provisioned IOPS SSD (io1)。

Netflix 評估影片品質的方法

Netflix 在發了一篇很長的文章,說明怎麼評估 video quality:「Toward A Practical Perceptual Video Quality Metric」,文章雖然有點長,但其實還蠻好懂的...

講的白話一點,Netflix 想要做各種壓縮方式的改善,但在超大的量的情況下 (scale) 缺乏自動化打分數的機制:

All of the challenging work described above hinges on one fundamental premise: that we can accurately and efficiently measure the perceptual quality of a video stream at scale.

如果先不考慮 scale 問題,影片的評估方式有人工處理以及常見的計算方法 (像是 MSEPSNRSSIM):

Traditionally, in video codec development and research, two methods have been extensively used to evaluate video quality: 1) Visual subjective testing and 2) Calculation of simple metrics such as PSNR, or more recently, SSIM.

前者因為牽涉到人工,所以不 scale,而後者跟「人的觀感」還是不夠正相關:

Without doubt, manual visual inspection is operationally and economically infeasible for the throughput of our production, A/B test monitoring and encoding research experiments.

Although researchers and engineers in the field are well-aware that PSNR does not consistently reflect human perception, it remains the de facto standard for codec comparisons and codec standardization work.

Netflix 的作法其實很簡單:(但是每一步都很仔細)

  • 首先先把影片依照手上有的 metadata 歸類,然後再挑出代表性的剪輯,並且產生不同 bitrate 的檔案。
  • 用人工對這些剪輯評分。
  • 用機器產生各種既有計算方法的分數 (PSNR、SSIM、...)。
  • 用數學方法把人工的與機器算的分數建立 model。
  • 然後對於未知的影片先寄算出既有方法的分數 (PSNR、SSIM、...),然後套用 model 推估人的觀感。

沒什麼特別發明出來的演算法,只是苦工 XDDD

維基媒體的效能資訊

在「Wikimedia Performance Metrics」這邊看到維基媒體 (Wikimedia) 的效能資訊網頁:「Metrics — Wikimedia Performance」,這邊是以前端效能為主。在「Grafana」這邊則包括了更多的資訊。

在「Performance」這邊有關於效能頁面的說明,看起來還在開發...