去電視廣告的服務又來了...

看到「Plex’s DVR now lets you skip the commercials… by removing them for you」這篇文章,介紹 Plex 要推出去電視廣告的服務...

維基百科上的介紹比較清楚:「Plex (software)」,主要有兩個元件組成,media server 與 player:

  • The Plex Media Server desktop application runs on Windows, macOS and Linux-compatibles including some types of NAS devices. The 'server' desktop application organizes video, audio and photos from your collections and from online services, enabling the players to access and stream the contents.
  • The media players. There are official clients available for mobile devices, smart TVs, and streaming boxes, a web app and Plex Home Theater (no longer maintained), as well as many third-party alternatives.

然後這次要推出的功能是直接在錄影的時候把廣告拿掉:

Plex confirmed it’s rolling out a new feature that will allow cord cutters to skip the commercials in the TV programs recorded using its software, making the company’s lower-cost solution to streaming live TV more compelling. Unlike other commercial-skip options, Plex’s option will remove commercials from recordings automatically.

這讓我有些印像... 當年 TiVo 也有類似的功能,不過文章裡有提到 TiVo 是提供 skip 而非直接拿掉:

The new feature works by locating the commercials in your recorded media. It then actually removes them before the media is stored in your library. That sounds like it could be even better than TiVo’s commercial skipping option, for example, because you don’t have to press a button to skip the ads — they’re being pulled out for you, proactively.

不過主要是認識了 Plex 這個軟體... 如果是電視兒童的話應該用的到 XD 台灣目前的電視節目好像看的比較少...

Facebook 開源的 fastText

準確度維持在同一個水準上,但是速度卻快了 n 個數量級的 text classification 工具:「FAIR open-sources fastText」。

可以看到 fastText 的執行速度跟其他方法的差距:

Our experiments show that fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation.

除了 open source 外,也發表了論文:「Enriching Word Vectors with Subword Information」,看 abstract 的時候發現提到了 Skip-gram:

In this paper, we propose a new approach based on the skip-gram model, where each word is represented as a bag of character n-grams.

結果找資料發現自己以前寫過「Skip-gram」這篇 XDDD

Skip-gram

一路翻資料翻到 Skip-gram:「A CloserLook at Skip-gram Modelling (PDF)」,不確定是 2005 年 (出自「CiteSeerX — Citation Query A Closer look at Skip-gram modeling」) 還是 2006 年 (出自「CiteSeerX — A Closer Look at Skip-gram Modelling」) 的論文,不過 Google Scholar 是標 2006 年...

Skip-gram 實際上的定義很簡單,就是允許跳幾個字的意思... 依照原論文裡的定義,這個句子:

Insurgents killed in ongoing fighting.

在 bi-grams 的時候是拆成:{insurgents killed, killed in, in ongoing, ongoing fighting}。

在 2-skip-bi-grams 的時候拆成:{insurgents killed, insurgents in, insurgents ongoing, killed in, killed ongoing, killed fighting, in ongoing, in fighting, ongoing fighting}。

在 tri-grams 的時候是:{insurgents killed in, killed in ongoing, in ongoing fighting}。

在 2-skip-tri-grams 的時候是:{insurgents killed in, insurgents killed ongoing, insurgents killed fighting, insurgentsin ongoing, insurgents in fighting, insurgents ongoing fighting, killed in ongoing, killed in fighting, killed ongoing fighting, in ongoing fighting}。

這樣就有辦法在整篇文章都是用「台灣大學」的情況下以「台大」找到文章,解決一些「同義詞」想要解決的問題。

在論文裡有分析 coverage,不過這邊 coverage 是指什麼客觀評估方式就不知道了,等下來找找看到底是什麼...