假新聞產生器與偵測器

Hacker News 上看到的消息,是關於「使用類神經網路產生新聞」(也就是透過程式大量產生假新聞),這次的結果包括了「產生」與「偵測」兩個面向:「Grover – A State-of-the-Art Defense Against Neural Fake News (allenai.org)」。

實驗的網站在「Grover - A State-of-the-Art Defense against Neural Fake News」這邊,另外也有論文「Defending Against Neural Fake News」可以讀。

幾個月前,OpenAI 利用類神經網路,研發出「自動寫新聞」的程式,當時他們宣稱因為效果太好,決定不完整公開成果:「Better Language Models and Their Implications」,中文的報導可以參考 iThome 這篇:「AI文字產生技術引發假新聞爭議,OpenAI決定只公開部份技術成果」。

而現在 The Allen Institute for Artificial Intelligence 則是成功重製了 OpenAI 的成果,取名叫 Grover,發現訓練出來的模型除了可以拿來寫新聞外,也可以拿來偵測文章是不是機器產生的,而且就他們自己測試,辨識成功率還蠻高的:

To study and detect neural fake news, we built a model named Grover. Our study presents a surprising result: the best way to detect neural fake news is to use a model that is also a generator. The generator is most familiar with its own habits, quirks, and traits, as well as those from similar AI models, especially those trained on similar data, i.e. publicly available news. Our model, Grover, is a generator that can easily spot its own generated fake news articles, as well as those generated by other AIs. In a challenging setting with limited access to neural fake news articles, Grover obtains over 92% accuracy at telling apart human-written from machine-written news. Please read our publication for more information.

不過看起來 source code 與 model 還是沒放出來,但看起來遲早會有對應的 open source clone...

我想到在攻殼電視動畫裡面的情報管制戰,雖然電視動畫裡沒有講得很詳細,但感覺這類工具就是其中一環...

Amazon S3 淘汰 Path-style 存取方式的新計畫

先前在「Amazon S3 要拿掉 Path-style 存取方式」提到 Amazon S3 淘汰 Path-style 存取方式的計畫,經過幾天後有改變了。

Jeff Barr 發表了一篇「Amazon S3 Path Deprecation Plan – The Rest of the Story」,裡面提到本來的計畫是 Path-style model 只支援到 2020/09/30,被大幅修改為只有在 2020/09/30 後建立的 bucket 才會禁止使用 Path-style:

In response to feedback on the original deprecation plan that we announced last week, we are making an important change. Here’s the executive summary:

Original Plan – Support for the path-style model ends on September 30, 2020.

Revised Plan – Support for the path-style model continues for buckets created on or before September 30, 2020. Buckets created after that date must be referenced using the virtual-hosted model.

這樣大幅降低本來會預期的衝擊,但 S3 團隊希望償還的技術債又得繼續下去了... 也許再過個幾年後才會再被提出來?

Cloudflare 推出 Spectrum:65535 個 TCP Port 都可以轉的 Proxy...

Cloudflare 推出了 Spectrum,文章標題提到的 65533 應該是指 80 & 443 以外其他的 port:「Introducing Spectrum: Extending Cloudflare To 65,533 More Ports」。

然後因為 TCP proxy 不像 HTTP proxy 與 WebSocket proxy 可以靠 Host header 資訊判斷,在 TCP proxy 需要獨占 IP address 使用 (i.e. 一個 IP address 只能給一個客戶用),而因為 IPv4 address 不夠的關係,這個功能只開放給 Enterprise 客戶用:

Today we are introducing Spectrum, which brings Cloudflare’s security and acceleration to the whole spectrum of TCP ports and protocols for our Enterprise customers.

雖然現在限定在 Enterprise 客戶,但 Cloudflare 還是希望看看有沒有其他想法,目前提出來的選項包括了開放 IPv6 address 給所有人用,或是變成獨立付費項目:

Why just Enterprise? While HTTP can use the Host header to identify services, TCP relies on each service having a unique IP address in order to identify it. Since IPv4 addresses are endangered, it’s quite expensive for us to delegate an IP per application and we needed to limit use. We’re actively thinking about ways to bring Spectrum to everyone. One idea is to offer IPv6-only Spectrum to non-Enterprise customers. Another idea is let anyone use Spectrum but pay for the IPv4 address. We’re not sure yet, but if you prefer one to the other, feel free to comment and let us know.

類似的產品應該是 clean pipe 類的服務,但一般 clean pipe 是透過 routing 重導清洗流量,而非像 Cloudflare 這樣設計... 不知道後續會有什麼樣的變化。

Raspberry Pi 3 的新版本 Model B+

Raspberry Pi 3 推出了 Model B+ 的新版本:「Raspberry Pi 3 Model B+ on sale now at $35」。

除了 CPU 速度稍微快一些以外,另外支援了 802.11ac/5Ghz 的無線網路 (官方宣稱可以跑到 ~102Mbps,相較於先前在 2.4Ghz 只能跑到 ~35Mbps),以及更快的有線網路 (官方宣稱可以跑到 ~315Mbps,相較於先前的 ~95Mbps)。

然後是支援 PXE

Raspberry Pi 3B was our first product to support PXE Ethernet boot. Testing it in the wild shook out a number of compatibility issues with particular switches and traffic environments. Gordon has rolled up fixes for all known issues into the BCM2837B0 boot ROM, and PXE boot is now enabled by default.

以及支援 PoE 直接推動整台機器:

We use a magjack that supports Power over Ethernet (PoE), and bring the relevant signals to a new 4-pin header. We will shortly launch a PoE HAT which can generate the 5V necessary to power the Raspberry Pi from the 48V PoE supply.

或是吃更多電 XDDD

Note that Raspberry Pi 3B+ does consume substantially more power than its predecessor. We strongly encourage you to use a high-quality 2.5A power supply, such as the official Raspberry Pi Universal Power Supply.

所以看到這張圖時就不意外了 XDDD (風扇!)

只是風扇的細節要再找一下,在產品頁上好像沒看到...

Update:風扇那張圖的產品頁看起來在「Raspberry Pi PoE HAT」這頁 (參考下面的 comment)。

Amazon Aurora 的 Serverless 與 Multi-master

Amazon Aurora 推出了兩包玩意,第一包是 Serverless,讓需要人介入的情況更少:「In The Works – Amazon Aurora Serverless」。

在 Serverless 的第一個重點是支援以秒計費:

Today we are launching a preview (sign up now) of Amazon Aurora Serverless. Designed for workloads that are highly variable and subject to rapid change, this new configuration allows you to pay for the database resources you use, on a second-by-second basis.

然後是極為快速的 auto-scaling:

The endpoint is a simple proxy that routes your queries to a rapidly scaled fleet of database resources. This allows your connections to remain intact even as scaling operations take place behind the scenes. Scaling is rapid, with new resources coming online within 5 seconds

這兩個組合起來,讓使用端可以除了在 Amazon EC2 上可以快速 scale 外,後端的資料庫也能 scale 了...

第二個是 Multi-master 架構:「Sign Up for the Preview of Amazon Aurora Multi-Master」。

Amazon Aurora Multi-Master allows you to create multiple read/write master instances across multiple Availability Zones. This enables applications to read and write data to multiple database instances in a cluster, just as you can read across Read Replicas today.

(話說我一直都誤以為 Aurora 是 R/W master...)

Anyway,這個功能不知道怎麼疊上去的... 不笑得會不會有嚴重的 distributed lock issue,反而推薦大家平常都寫到同一台 (像是 PXC 就會這樣)。

AWS PrivateLink + SaaS 的用法

原來 AWS 搞 PrivateLink 不只要整合自己的服務,還包括非 AWS 的服務:「AWS PrivateLink Update – VPC Endpoints for Your Own Applications & Services」。

簡略的來說,以往的 SaaS 服務大多都是提供 Public IP 讓客戶端使用,對於服務的使用方與提供方來說,當兩者都在 AWS 同一個 region 時,在處理 security group 設定不太方便,所以通常就不會設定... 另外還要注意可以從外部透過 access token 存取服務 (像是有員工離職,但 access token 未必會換掉)。

這次推出的 PrivateLink + SaaS 的組合提供了另外一個選擇,可以把服務藏在內部,安全性比以前好很多:

Today we are building upon the initial launch and extending the PrivateLink model, allowing you to set up and use VPC Endpoints to access your own services and those made available by others.

不過這個機制綁 AWS 綁的更深了...

對 Open Data 的攻擊手段

前陣子看到的「Membership Inference Attacks against Machine Learning Models」,裡面試著做到的攻擊手法:

[G]iven a data record and black-box access to a model, determine if the record was in the model's training dataset.

也就是拿到一組 Open Data 的存取權限,然後發展一套方法判斷某筆資料是否在裡面。而驗證攻擊的手法當然就是直接攻擊看效果:

We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

透過 NN 攻擊 NN,而目前的解法也不太好處理,但有做總是會讓精確度降低。論文裡提到了四種讓難度增加的方法:

  • Restrict the prediction vector to top k classes.
  • Coarsen precision of the prediction vector.
  • Increase entropy of the prediction vector.
  • Use regularization.

另外一個值得看的資料是 2006 年發生的「AOL search data leak」,當年資料被放出來後有真實的使用者被找出來,也是很轟動啊...

Apple 的 App Store 的訂閱制度更新

先前在「蘋果 App Store 收費模式的改變」這邊提到的改變,這幾天細節公開了:「Subscriptions - App Store - Apple Developer」。

最主要的改變在於超過一年的費用從原來的 30% 降低到 15%:

Within a subscriber’s first year of an auto-renewable subscription, you receive the traditional 70% of the subscription price at each billing cycle, minus applicable taxes. After a subscriber accumulates one year of paid service, your revenue increases to 85% of the subscription price, minus applicable taxes.

不知道這對市場生態會帶來怎麼樣的影響...

Netflix 評估影片品質的方法

Netflix 在發了一篇很長的文章,說明怎麼評估 video quality:「Toward A Practical Perceptual Video Quality Metric」,文章雖然有點長,但其實還蠻好懂的...

講的白話一點,Netflix 想要做各種壓縮方式的改善,但在超大的量的情況下 (scale) 缺乏自動化打分數的機制:

All of the challenging work described above hinges on one fundamental premise: that we can accurately and efficiently measure the perceptual quality of a video stream at scale.

如果先不考慮 scale 問題,影片的評估方式有人工處理以及常見的計算方法 (像是 MSEPSNRSSIM):

Traditionally, in video codec development and research, two methods have been extensively used to evaluate video quality: 1) Visual subjective testing and 2) Calculation of simple metrics such as PSNR, or more recently, SSIM.

前者因為牽涉到人工,所以不 scale,而後者跟「人的觀感」還是不夠正相關:

Without doubt, manual visual inspection is operationally and economically infeasible for the throughput of our production, A/B test monitoring and encoding research experiments.

Although researchers and engineers in the field are well-aware that PSNR does not consistently reflect human perception, it remains the de facto standard for codec comparisons and codec standardization work.

Netflix 的作法其實很簡單:(但是每一步都很仔細)

  • 首先先把影片依照手上有的 metadata 歸類,然後再挑出代表性的剪輯,並且產生不同 bitrate 的檔案。
  • 用人工對這些剪輯評分。
  • 用機器產生各種既有計算方法的分數 (PSNR、SSIM、...)。
  • 用數學方法把人工的與機器算的分數建立 model。
  • 然後對於未知的影片先寄算出既有方法的分數 (PSNR、SSIM、...),然後套用 model 推估人的觀感。

沒什麼特別發明出來的演算法,只是苦工 XDDD

新版 EC2 的 Reserved Instance

前幾天 AWS 宣佈 EC2Reserved Instance 改變販賣方式:「Simplifying the EC2 Reserved Instance Model」。

本來的賣法是收兩種錢,一個是一次性的費用,另外一個是每個小時的費用。在舊的架構下,有不同種類的一次性費用,對上不同的小時單價。

新的架構變成收三種錢,一個是一次性的費用,一個是每個月的基本費 (電信資費中,類似於「基本費不可抵通話費」的觀念),最後是每個小時的費用。

看 AWS 文章的說明有點難懂,直接去 EC2 的價目表上面看就知道了。

至於這樣有沒有簡化,我覺得是見仁見智啦... 這次的 RI model 的改變最主要是降低對現金流的要求,付出一點利息就可以把一次性費用打散到十二個月 (或是三十六個月)。

對於 startup 應該還蠻有吸引力的改變。