GitHub Copilot 產生出來程式的安全性問題

看到「Encoding data for POST requests」這篇大家才回頭注意到 GitHub Copilot 首頁的範例本身就有安全漏洞:

async function isPositive(text: string): Promise<boolean> {
  const response = await fetch(`http://text-processing.com/api/sentiment/`, {
    method: "POST",
    body: `text=${text}`,
    headers: {
      "Content-Type": "application/x-www-form-urlencoded",
    },
  });
  const json = await response.json();
  return json.label === "pos";
}

其中 text=${text} 是一個 injection 類的漏洞,首頁的範例應該是被挑過的,但仍然出現了這個嚴重的問題,從這邊可以看出 GitHubOpenAI 在這條線上的問題...

GitHub 與 OpenAI 合作推出的 GitHub Copilot

Hacker News 首頁上的第一名看到 GitHubOpenAI 合作推出了 GitHub Copilot,對應的討論可以在「GitHub Copilot: your AI pair programmer (copilot.github.com)」這邊看到。

GitHub Copilot 會猜測你接下來會想要寫的「完整片段」,像是這樣:

不過 Hacker News 上面的討論有參與 alpha 測試的人的評價,大概 1/10 機率會猜對,即使如此,他還是給了很多有用的資訊 (像是函式與變數的名稱):

fzaninotto

I've been using the alpha for the past 2 weeks, and I'm blown away. Copilot guesses the exact code I want to write about one in ten times, and the rest of the time it suggests something rather good, or completely off. But when it guesses right, it feels like it's reading my mind.

It's really like pair programming, even though I'm coding alone. I have a better understanding of my own code, and I tend to give better names and descriptions to my methods. I write better code, documentation, and tests.

Copilot has made me a better programmer. No kidding. This is a huge achievement. Kudos to the GitHub Copilot team!

然後也有人笑稱總算找到理由寫 comment 了:

pfraze

They finally did it. They finally found a way to make me write comments

反過來的另外一個大問題就是 copyright,這點在目前的問答集沒看到... 在 Hacker News 裡面的討論有提到這點,但目前沒有完整的定論。

目前只支援 VSCode,以後也許會有機會透過 LSP 支援其他的編輯器?

另外我想到 Kite 這個 machine learning 的 auto complete 工具,沒有那麼強大但也還不錯?

假新聞產生器與偵測器

Hacker News 上看到的消息,是關於「使用類神經網路產生新聞」(也就是透過程式大量產生假新聞),這次的結果包括了「產生」與「偵測」兩個面向:「Grover – A State-of-the-Art Defense Against Neural Fake News (allenai.org)」。

實驗的網站在「Grover - A State-of-the-Art Defense against Neural Fake News」這邊,另外也有論文「Defending Against Neural Fake News」可以讀。

幾個月前,OpenAI 利用類神經網路,研發出「自動寫新聞」的程式,當時他們宣稱因為效果太好,決定不完整公開成果:「Better Language Models and Their Implications」,中文的報導可以參考 iThome 這篇:「AI文字產生技術引發假新聞爭議,OpenAI決定只公開部份技術成果」。

而現在 The Allen Institute for Artificial Intelligence 則是成功重製了 OpenAI 的成果,取名叫 Grover,發現訓練出來的模型除了可以拿來寫新聞外,也可以拿來偵測文章是不是機器產生的,而且就他們自己測試,辨識成功率還蠻高的:

To study and detect neural fake news, we built a model named Grover. Our study presents a surprising result: the best way to detect neural fake news is to use a model that is also a generator. The generator is most familiar with its own habits, quirks, and traits, as well as those from similar AI models, especially those trained on similar data, i.e. publicly available news. Our model, Grover, is a generator that can easily spot its own generated fake news articles, as well as those generated by other AIs. In a challenging setting with limited access to neural fake news articles, Grover obtains over 92% accuracy at telling apart human-written from machine-written news. Please read our publication for more information.

不過看起來 source code 與 model 還是沒放出來,但看起來遲早會有對應的 open source clone...

我想到在攻殼電視動畫裡面的情報管制戰,雖然電視動畫裡沒有講得很詳細,但感覺這類工具就是其中一環...