玩最近 Facebook Research (Meta) 放出來的 LLaMA

很多地方應該都有提到 Facebook Research (Meta) 放出來的 LLaMA 了,對應的論文是「LLaMA: Open and Efficient Foundation Language Models」這篇,但這邊論文提到的 open 並不是一般常見的 open 定義,而只是常見的行銷詞彙而已,實際上只是 free for charging with constraints。

另外要注意 LLaMA 是個 LLM 而已,跟 ChatGPT 不算是同樣性質的東西,能對比應該是 GPT-3 (或是 GPT-3.5)。

主要是 ChatGPT 多了 SLRL 的步驟,而產出來的東西更接近商業化產品要的結果。

LLaMA 的特點在於效能不錯,可以用 LLaMA-13B 打贏 GPT-3 (175B),另外這次訓練出來最大的 LLaMA-65B 則可以站上第一梯隊 (與 DeepMindChinchilla-70BGoogle ResearchPaLM-540B):

LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B.

但跟以前差異最大的是,這次 Facebook Research 決定把訓練完後的 model 檔案放出來,所以就有了後續很多的進展:

We release all our models to the research community.

首先一開始 Facebook Research 要求使用者填表單才提供下載 (2/24 的時候),但三月初的時候 GitHub 上有人直接把 BitTorrentmagnet 連結附上去,送了一個 pull request:「Save bandwidth by using a torrent to distribute more efficiently #73」,所以你就有「方法」可以取得 model 檔案,但還是可以注意一下使用限制:

To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license focused on research use cases. Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world. People interested in applying for access can find the link to the application in our research paper.

除了可以透過 BitTorrent 下載外,comment 裡面也有 IPFS 的連結可以下載。

有需要下載這包檔案的人要注意檔案很大,大約 240GB,其中 65B model 佔了 128GB 左右。

被放出來以後就開始有很多人在上面包起來用,其中目前比較完整的應該是「Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp」這篇提到的方法,裡面提到的 ggerganov/llama.cpp 專案同時支援了 WindowsmacOSLinux,而且可以用 CPU 跑,速度也不慢。

試著用 llama.cpp 跑,65B 的 model 在家裡桌機有 64GB RAM 的情況下是可以應硬扛的,跑降到 4bits 的 model 大約吃 41GB RAM 左右。

比較驚訝的是 efficient 這個部份,拿 CPU 跑 65B 版本是跑得動的。

隨便丟個「文字接龍」進去 LLaMA-65B 讓他接,可以看到接出來的東西還可以:

main: prompt: 'The main difference between javascript and java are '
main: number of tokens in prompt = 10
[...]

The main difference between javascript and java are 1)Java is a compiled language while JavaScript
is not. So that the performance of Javascript code can be lower than Java Code, but they run on different platforms: browser vs server or local machine;
2)'javac' to compile .java file to bytecode(for JVM), and 'jar'(to make executable jar files for client systems).
3)Java uses more memory (RAM) that javascript. So the Javascript code is generally small than Java
codes.(A web application written in JavaScript may run faster on a PC with 64 MB of RAM compared to one running using applets.) While an advantage offered by J

如果是 LLaMA-7B 的話會快很多,但回答就有點微妙了:

main: prompt: 'The main difference between javascript and java are '
main: number of tokens in prompt = 10
[...]

The main difference between javascript and java are 1) Javascript is interpreted, while Java compiles to native machine code. (Compiled languages have a performance advantage over Interpreted ones
.)
2) The language standardization process for JavaScript has not been completed at the time of this writing which leaves it open to being hacked or changed by those who care enough about your website to do so... In Java, everything is set in stone. (Or at least as much code that can be shipped on a CD-ROM disk)
Sorry if my response was not clear - but you're right! I think that "2" above should really have said: "(

訓練所花的資源的部份,可以從論文裡面看到,如果是 2048 張 A100 的話大約要跑五個月 (照這個語氣,實際上大概不是這個數字):

Finally, we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models.

另外也有列出 GPU hours 可以參考:

Kagi 開發的 Universal Summarizer

在「Universal Summarizer (kagi.com)」這邊看到的新服務,可以給出某個網址的 summary。服務的本體則是在「Kagi - Universal Summarizer」這邊,從網址可以猜測是 Kagi 的實驗項目。

像是 CNN 的「Biden’s dramatic warning to China」這篇,他抓出對應的 summary 看起來沒什麼問題:

President Joe Biden delivered a dramatic warning to China in his State of the Union address, vowing to protect America's sovereignty if China threatens it. He specifically named President Xi Jinping, saying "Name me a world leader who'd change places with Xi Jinping. Name me one!" This marked a stark escalation in the US-Chinese relationship, which has been strained by a balloon surveillance program and other issues. Biden also addressed Russia, calling their invasion of Ukraine a test for America and the world. His speech highlighted the unified opposition to China in US politics, with House Speaker Kevin McCarthy having convened a bipartisan House committee to examine the perceived threat from the Chinese Communist Party. Biden's comments also served as an important milestone in the increasingly tumultuous competition between the US and China, as the US shifts to talking about establishing guardrails for the relationship and protecting the Western-led rules-based international system.

這個功能可以用 GPT-3.5 或是 ChatGPT 串,但不確定 Kagi 是串上去還自己搞?我猜有蠻大的機會是串的...

然後他的網址設計因為是 url 傳遞的方式,可以包裝成 bookmarklet 形式放在快速列上面用:「Kagi Universal Summarizer」。

另外在 HN 的討論裡面看到很厲害的用法,把 https://bellard.org/quickjs/pi_bigdecimal.js 這段 javascript code 的 url 丟進去,然後居然出現了程式碼的說明,居然還正確判斷出是 Chudnovsky algorithm

This code uses the QuickJS bigdecimal type to calculate the value of pi to a given precision. It does this by using the Chudnovsky algorithm, which is a series of calculations that can be used to approximate pi. The code is written in Javascript and uses BigInt and BigDecimal to perform the calculations. It is interesting to note that the code also takes into account the possibility of bad rounding for the last digits, and adds extra digits to reduce the probability of this happening.

既然包了一個 bookmarklet,最近應該會常常拿出來用...

用 AI 模型判斷是否為 AI 產生的文字

OpenAI 放出了新的 model,可以用來判斷是否為 AI 產生的文字:「New AI classifier for indicating AI-written text」。

但目前的成效其實還是不太行,只以英文的成效來看,true positive 只有 26%,而 false positive 是 9%:

In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives).

另外也有提到弱點,像是比較短的內容機很難辨認:

The classifier is very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.

然後就是有正確答案的內容也很難辨認,因為正確答案幾乎都是一樣的:

Text that is very predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or humans, because the correct answer is always the same.

另外題到了技術上的限制,現在的方法比較像是「辨認是不是從某些 corpus 訓練出來的 model,所產生的文字」,而非通用性的 AI 文字偵測:

Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction.

看起來是還不到可以用的程度...

OpenAI 推出 ChatGPT Plus

OpenAI 提出了 ChatGPT 的付費方案:「Introducing ChatGPT Plus」。

目前只開美國:

ChatGPT Plus is available to customers in the United States, and we will begin the process of inviting people from our waitlist over the coming weeks. We plan to expand access and support to additional countries and regions soon.

公告的價錢是 US$20/mo,基本上就是保證使用權。這跟之前有傳言 US$42/mo 叫 Professional 的方案低了不少:「ChatGPT users report $42 a month pricing for ‘pro’ access but no official announcement yet」:

The new subscription plan, ChatGPT Plus, will be available for $20/month, and subscribers will receive a number of benefits:

  • General access to ChatGPT, even during peak times
  • Faster response times
  • Priority access to new features and improvements

應該是會訂起來用,光是現在 free tier 就已經找到一些常用的模式,可以省下不少時間...

直接用 prompt 產生音樂的 Riffusion

很紅的 Stable Diffusion 是寫一串文字 (prompt) 然後產生圖片,而 Riffusion 則是寫一串文字產生音樂。

其中 prompt 轉成音樂其實還在可以預期的範圍 (i.e. 遲早會出現),但專案的頁面上解釋了 Riffusion 是基於 Stable Fusion 的作品,而且是利用 Stable Fusion 產生出時頻譜 (spectrogram):

Well, we fine-tuned the model to generate images of spectrograms, like this:

也就是像這樣的圖:

Hacker News 上討論時的討論頁可以看看,作者有參與一些討論:「Riffusion – Stable Diffusion fine-tuned to generate music (riffusion.com)」。

其中有人提到這個作法超出想像,因為輸出的圖片只要幾個 pixel 差一點點就會產生出很不同的聲音:

This really is unreasonably effective. Spectrograms are a lot less forgiving of minor errors than a painting. Move a brush stroke up or down a few pixels, you probably won't notice. Move a spectral element up or down a bit and you have a completely different sound. I don't understand how this can possibly be precise enough to generate anything close to a cohesive output.

Absolutely blows my mind.

然後其中一位作者回覆到,他也是做下去後才很意外發現居然可行:

Author here: We were blown away too. This project started with a question in our minds about whether it was even possible for the stable diffusion model architecture to output something with the level of fidelity needed for the resulting audio to sound reasonable.

實際上聽了產生出來的音樂,是真的還 OK 的音樂... 大家都完全沒想到可以這樣搞,然後在 Hacker News 上的 upvote 數量爆炸高 XD

可以自己調整的黑白照片上色服務

Hacker News Daily 上看到 Palette 這個服務,作者在 Hacker News 上有提到你可以提供一些句子調整顏色:「Show HN: I made a new AI colorizer (palette.fm)」。

Hi HN, I’m Emil, the maker behind Palette. I’ve been tinkering with AI and colorization for about five years. This is my latest colorization model. It’s a text-based AI colorizer, so you can edit the colorizations with natural language. To make it easier to use, I also automatically create captions and generate filters.

作者有把一些作品貼在 Reddit 上面,可以參考 https://www.reddit.com/user/emilwallner/?sort=top 這邊,看起來已經有一陣子了...

Amazon EC2 的 Trn1 正式開放使用

AWS 自家研發晶片的 trn1.* 上線了:「Amazon EC2 Trn1 Instances for High-Performance Model Training are Now Available」。

先前三家雲端的廠商只有 Google Cloud PlatformTPU 可以 train & evaluate,現在 AWS 推出 AWS Trainium,補上 train 這塊的產品。其中官方宣稱可以比 GPU 架構少 50% 的計算成本:

Trainium-based EC2 Trn1 instances solve this challenge by delivering faster time-to-train while offering up to 50% cost-to-train savings over comparable GPU-based instances.

然後 PyTorchTensorFlow 都有支援:

The Neuron plugin natively integrates with popular ML frameworks, such as PyTorch and TensorFlow.

另外用 neuron-ls 可以看到 Neuron 裝置的資訊,不過沒看懂為什麼要 mask 掉 private ip 的資訊:

大型的 cluster 會使用 Amazon FSx for Lustre 整合提供服務:

For large-scale model training, Trn1 instances integrate with Amazon FSx for Lustre high-performance storage and are deployed in EC2 UltraClusters. EC2 UltraClusters are hyperscale clusters interconnected with a non-blocking petabit-scale network.

但第一波開放的區域有點少,只有萬年美東一區 us-east-1 與美西二區 us-west-2

You can launch Trn1 instances today in the AWS US East (N. Virginia) and US West (Oregon) Regions as On-Demand, Reserved, and Spot Instances or as part of a Savings Plan.

us-east-1trn1.2xlarge 的價錢是 US$1.34375/hr,但沒有實際跑過比較好像沒辦法評估到底行不行...

但總算是擺出個產品對打看看,畢竟要夠大才能去訂製這些東西。

這兩個禮拜爆紅的 Stable Diffusion

Stable DiffusionStability AI 訓練出來的 model,跟之前提到的 DALL-E 最大的差異就是產生出的圖的限制少很多:

Unlike competing models like DALL-E, Stable Diffusion is open source and does not artificially limit the images it produces, though the license prohibits certain harmful use cases.

這也造就了這兩個禮拜整個 Stable Diffusion 的各種應用急速成長。

Simon Willison 的「Stable Diffusion is a really big deal」這篇來當作總覽還不錯。

除了授權使用上的限制以外,在技術上的限制也比較少 (有很大一部分會歸功於社群的各種 porting),包括了:

除了先前大家已經熟悉的 txt2img 功能以外,Stable Diffusion 另外提供了 img2img 的能力,也就是先給一張圖,然後再給對應的句子要求 Stable Diffusion 去改這張圖,所以就會有像是把這張圖:

加上「A distant futuristic city full of tall buildings inside a huge transparent glass dome, In the middle of a barren desert full of large dunes, Sun rays, Artstation, Dark sky full of stars with a shiny sun, Massive scale, Fog, Highly detailed, Cinematic, Colorful」的句子後,提供了這張圖:

以及這張圖:

這樣可玩性又多了不少...

用 DALL·E 2 的圖當作網誌文章的圖片

Hacker News 上看到「I replaced all our blog thumbnails using DALL·E 2 (deephaven.io)」這個點子,原文在「I replaced all our blog thumbnails using DALL·E 2 for $45: here’s what I learned」這邊。

網誌文章如果包含好的圖片時,曝光度與互動都會比較多。所以作者就想到用 OpenAIDALL·E 2 來搞事了:給個描述,請 DALL·E 2 生成圖片。

文章裡面有很多產生出來的圖都蠻有趣的,像是「a cute blue colored gopher with blue fur programming on multiple monitors displaying many spreadsheets, digital art」這個描述生出來的圖:

不過不算便宜,他花了 US$45 生成大約一百篇文章的圖:

I spent the weekend and $45 in OpenAi credits generating new thumbnails that better represent the content of all 100+ posts from our blog.

如果用先前「玩玩文字轉圖片的 min(DALL·E)」這邊提到的方法自己搞不知道可不可行?

確認 Deepfake Video Call 的方式

Hacker News 首頁上看到「To uncover a deepfake video call, ask the caller to turn sideways (metaphysic.ai)」這篇,講怎麼在 video call 的時候辨認是不是 deepfake 的人。原文在「To Uncover a Deepfake Video Call, Ask the Caller to Turn Sideways」這邊可以讀到。

文章裡面介紹一個很簡單的方式,就是轉頭拍側面,會看到明顯的缺陷:

這張 GIF 會更明顯:

Hacker News 上的討論也有人提到這是目前的技術限制,所以這個方法算是有效的。但以現在各類 machine learning 成長的速度來看,可能沒多久後就會有新技術突破這個方法了。