Amazon Redshift 會自動在背景重新排序資料以增加效能

Amazon Redshift 的新功能,會自動在背景重新排序資料以增加效能:「Amazon Redshift introduces Automatic Table Sort, an automated alternative to Vacuum Sort」。

版本要到更新到 1.0.11118,然後預設就會打開:

This feature is available in Redshift 1.0.11118 and later.

Automatic table sort is now enabled by default on Redshift tables where a sort key is specified.


Redshift runs the sorting in the background and re-organizes the data in tables to maintain sort order and provide optimal performance. This operation does not interrupt query processing and reduces the compute resources required by operating only on frequently accessed blocks of data. It prioritizes which blocks of table to sort by analyzing query patterns using machine learning.

算是丟著讓他跑就好的東西,升級上去後可以看一下 CloudWatch 的報告,這邊沒有特別講應該是還好... XD

擋 Live 與 Podcast 內廣告的工具

看到「An adblocker for live radio streams and podcasts. Machine learning meets Shazam.」這個專案,這個把 machine learning 用到「正途」上了啊...

不過畢竟是比較複雜的演算法,會吃不少 CPU 資源:

On a regular laptop CPU and with the Python time-frequency analyser, computations run at 5-10X for files and at 10-20% usage for live stream.

不過看用法還是偏向 library 性質,如果要大力推廣可能還是需要有其他人包個更好的界面...

用 Machine Learning 改善 Streaming 品質的服務與論文

Hacker News 上看到「Puffer」這個服務,裡面利用了 machine learning algorithm 動態調整 bitrate,以提昇傳輸品質。

測試得到的數據後來被整理起來一起放進論文:「Continual learning improves Internet video streaming」。

在開頭介紹了 Fugu 這個演算法:

We describe Fugu, a continual learning algorithm for bitrate selection in streaming video.

而 Puffer 就是實驗網站:

We evaluate Fugu with Puffer, a public website we built that streams live TV using Fugu and existing algorithms. Over a nine-day period in January 2019, Puffer streamed 8,131 hours of video to 3,719 unique users.


Stream live TV in your browser. There's no charge. You can watch U.S. TV stations affiliated with the NBC, CBS, ABC, PBS, FOX, and Univision networks.

可以看到 Fugu 的結果很好,比起其他提出的方案是全面性的改善:

這邊是用 WebSocket 測試,並且配合了不同的 TCP congestion algorithm,沒有太考慮額外的計算成本...

AI 版的星海爭霸二將直接透過歐洲區的 匿名與人類對戰

前幾天 Blizzard 公佈的消息,DeepMind 的星海爭霸二 AI (AlphaStar) 將會透過 Blizzard 的 歐洲區伺服器跟人類對戰:「DeepMind Research on Ladder」。

Experimental versions of DeepMind’s StarCraft II agent, AlphaStar, will soon play a small number of games on the competitive ladder in Europe as part of ongoing research into AI.


If you would like the chance to help DeepMind with its research by matching against AlphaStar, you can opt in by clicking the “opt-in” button on the in-game popup window. You can alter your opt-in selection at any time by using the “DeepMind opt-in” button on the 1v1 Versus menu.

但你仍然不會知道對手是人還是 AI,而且如同一般對戰情況,這會影響到你的戰績:

For scientific test purposes, DeepMind will be benchmarking AlphaStar’s performance by playing anonymously during a series of blind trial matches. This means the StarCraft community will not know which matches AlphaStar is playing, to help ensure that all games are played under the same conditions. AlphaStar plays with built-in restrictions that the DeepMind team has defined in consultation with pro players. A win or a loss against AlphaStar will affect your MMR as normal.


加州從今年七月開始,禁止 AI 偽裝成人類 (前幾天也有一些新聞在報導):「A California law now means chatbots have to disclose they’re not human」,對應的法條在「Bill Text - SB-1001 Bots: disclosure」這邊可以看到:

17941. (a) It shall be unlawful for any person to use a bot to communicate or interact with another person in California online, with the intent to mislead the other person about its artificial identity for the purpose of knowingly deceiving the person about the content of the communication in order to incentivize a purchase or sale of goods or services in a commercial transaction or to influence a vote in an election. A person using a bot shall not be liable under this section if the person discloses that it is a bot.

(b) The disclosure required by this section shall be clear, conspicuous, and reasonably designed to inform persons with whom the bot communicates or interacts that it is a bot.

而加州是 Blizzard Entertainment 的總部...

法條上面對「online platform」有設計排除條款,不過如果只算星海二的人數,有可能不到這個豁免限制... 所以得避開而改用歐洲區來測試?

(c) “Online platform” means any public-facing Internet Web site, Web application, or digital application, including a social network or publication, that has 10,000,000 or more unique monthly United States visitors or users for a majority of months during the preceding 12 months.

(c) This chapter does not impose a duty on service providers of online platforms, including, but not limited to, Web hosting and Internet service providers.

美國軍方應該是超級關注這個議題,相較於 AlphaGo 或是 AlphaZero 是資訊完全透明的遊戲,這次要踏入非對稱資訊的遊戲。

如果在這個領域上有成果的話,可以預期未來的戰爭 (yeah 實體戰爭) 會開始大量採用 AI 了...


Hacker News 上看到的消息,是關於「使用類神經網路產生新聞」(也就是透過程式大量產生假新聞),這次的結果包括了「產生」與「偵測」兩個面向:「Grover – A State-of-the-Art Defense Against Neural Fake News (」。

實驗的網站在「Grover - A State-of-the-Art Defense against Neural Fake News」這邊,另外也有論文「Defending Against Neural Fake News」可以讀。

幾個月前,OpenAI 利用類神經網路,研發出「自動寫新聞」的程式,當時他們宣稱因為效果太好,決定不完整公開成果:「Better Language Models and Their Implications」,中文的報導可以參考 iThome 這篇:「AI文字產生技術引發假新聞爭議,OpenAI決定只公開部份技術成果」。

而現在 The Allen Institute for Artificial Intelligence 則是成功重製了 OpenAI 的成果,取名叫 Grover,發現訓練出來的模型除了可以拿來寫新聞外,也可以拿來偵測文章是不是機器產生的,而且就他們自己測試,辨識成功率還蠻高的:

To study and detect neural fake news, we built a model named Grover. Our study presents a surprising result: the best way to detect neural fake news is to use a model that is also a generator. The generator is most familiar with its own habits, quirks, and traits, as well as those from similar AI models, especially those trained on similar data, i.e. publicly available news. Our model, Grover, is a generator that can easily spot its own generated fake news articles, as well as those generated by other AIs. In a challenging setting with limited access to neural fake news articles, Grover obtains over 92% accuracy at telling apart human-written from machine-written news. Please read our publication for more information.

不過看起來 source code 與 model 還是沒放出來,但看起來遲早會有對應的 open source clone...



2013 年時在「Automatic Translation Without Dictionaries」這邊看到關於機器翻譯時的自我學習方式,裡面提到了「How Google Converted Language Translation Into a Problem of Vector Space Mathematics」這篇報導,而裡面提到的論文則是 Google 發表在 arXiv 上的「Exploiting Similarities among Languages for Machine Translation」這篇。

最近看到「The Illustrated Word2vec」這篇,把五年多前的記錄交叉拉出來看... 這個算式算是給了大家基本的想法,透過公式來解釋文字的意義:

拉出這樣的關係後,就有機會學習新的詞彙... 進而用在其他語言的翻譯上。

用 NN 演算法重製 Full HD 版的 Star Trek: DS9

看到「Remastering Star Trek: Deep Space Nine With Machine Learning」這篇,裡面用了類神經網路演算法,將本來只有 480p (SD) 的 Star Trek: DS9 升到 1080p (Full HD) 的版本,而且看起來效果還不錯...

意外的看到有人拿 Star Trek 的材料來玩... 依照作者的說明,DS9 一直沒有 Full HD 版的其中一個原因反而是因為「數位化」了。使用類比膠卷的母帶可以透過更高規格的重新掃描而得到高畫質版本,但 DS9 的母帶似乎已經是數位版了,所以反而造成無法透過重新掃描的方式取得 Full HD 版本:

While you can rescan analog film at a higher resolution, video is digital and can't be rescanned. This makes it much costlier to remaster this TV show, which is one of the reasons why it hasn't happened.

現有的 upscale 技術主要都還是以圖片為主,所以作者本來以為對於動態畫面的處理會遇到問題,但蠻意外的超出預期,從影片可以看出來:

看起來之後的 remaster 版本有可能可以靠這個方法先做初步,然後再讓人進去修?

LinkedIn 用機器學習提供雇主可能的職缺對象

先前看到「Learning Hiring Preferences: The AI Behind LinkedIn Jobs」這篇,LinkedIn 用機器學習提供雇主可能的對象。

依照官方的說法,這次提到的改進是透過雇主的行為調整推薦。當雇主對某個人有興趣的時候,LinkedIn 就會調整演算法去配合雇主有興趣的條件:

Based on how you interact with candidates, our algorithm learns your preferences and delivers increasingly relevant candidates across the Jobs product. If you’re consistently interested in candidates who are, say, accountants with leadership skills, or project managers who are adept at social media, we’ll send you more of those. And this all happens online in real time so that your feedback is taken instantly into account.

透過模擬 20% 的加成:

This new algorithm, which is used throughout the Jobs platform, performs nearly 20% better than the previous version in generating recommendations when we simulate our members' past hiring activity.

在 social network 這種演算法其實就是同溫層 (Echo chamber、Filter bubble),在 LinkedIn 這樣的行為不知道會不會牽扯到 Discrimination 的議題...

AWS 的推薦演算法服務:Amazon Personalize

AWS 把推薦演算法包成服務拿來來賣,叫做 Amazon Personalize:「Amazon Personalize – Real-Time Personalization and Recommendation for Everyone」。



可以看出來這個使用者對 2,29,32,47,50 這些 movieId 在不同的時間點都給了 3.5 分的評分。

然後經過一連串的 API 操作 (有些參數可以調整,但主要是叫 AWS 運算,並且建立 real-time 的服務),就可以看到推薦哪些其他的 item 了:

$ aws personalize-rec get-recommendations --campaign-arn $CAMPAIGN_ARN --user-id $USER_ID --query "itemList[*].itemId"
["1210", "260", "2571", "110", "296", "1193", ...]

而從 Pricing 的頁面可以看到支援 real-time data 與 batch data:

You are charged per GB of data uploaded to Amazon Personalize. This includes real-time data streamed to Amazon Personalize and batch data uploaded via Amazon S3.


AWS 新推出的 Amazon Elastic Inference:GPU 出租方案

AWS 推出了 Amazon Elastic Inference,可以讓你選擇 GPU 的量掛進 EC2 instance:「Amazon Elastic Inference – GPU-Powered Deep Learning Inference Acceleration」。

第一眼看到的時候在想這不是之前出過了嗎... 後來搜尋發現應該是針對圖形運算與 machine learning 的應用拆開使用不同的硬體?

所以在前陣子 AWS 公告將 Amazon EC2 Elastic GPUs 改名為 Amazon Elastic Graphics:「Amazon EC2 Elastic GPUs is now Amazon Elastic Graphics」。

舊的 Amazon EC2 Elastic GPUs (Amazon Elastic Graphics) 應該是針對圖形應用設計,而新的 Amazon Elastic Inference 則是針對 machine learning 設計。