Netflix 評估影片品質的方法

Netflix 在發了一篇很長的文章,說明怎麼評估 video quality:「Toward A Practical Perceptual Video Quality Metric」,文章雖然有點長,但其實還蠻好懂的...

講的白話一點,Netflix 想要做各種壓縮方式的改善,但在超大的量的情況下 (scale) 缺乏自動化打分數的機制:

All of the challenging work described above hinges on one fundamental premise: that we can accurately and efficiently measure the perceptual quality of a video stream at scale.

如果先不考慮 scale 問題,影片的評估方式有人工處理以及常見的計算方法 (像是 MSEPSNRSSIM):

Traditionally, in video codec development and research, two methods have been extensively used to evaluate video quality: 1) Visual subjective testing and 2) Calculation of simple metrics such as PSNR, or more recently, SSIM.

前者因為牽涉到人工,所以不 scale,而後者跟「人的觀感」還是不夠正相關:

Without doubt, manual visual inspection is operationally and economically infeasible for the throughput of our production, A/B test monitoring and encoding research experiments.

Although researchers and engineers in the field are well-aware that PSNR does not consistently reflect human perception, it remains the de facto standard for codec comparisons and codec standardization work.

Netflix 的作法其實很簡單:(但是每一步都很仔細)

  • 首先先把影片依照手上有的 metadata 歸類,然後再挑出代表性的剪輯,並且產生不同 bitrate 的檔案。
  • 用人工對這些剪輯評分。
  • 用機器產生各種既有計算方法的分數 (PSNR、SSIM、...)。
  • 用數學方法把人工的與機器算的分數建立 model。
  • 然後對於未知的影片先寄算出既有方法的分數 (PSNR、SSIM、...),然後套用 model 推估人的觀感。

沒什麼特別發明出來的演算法,只是苦工 XDDD