llama.cpp 的載入速度加速

Hacker News 上看到「Llama.cpp 30B runs with only 6GB of RAM now (github.com/ggerganov)」這個消息,原 pull request 在「Make loading weights 10-100x faster #613」這邊。

這個 PR 的作者 Justine Tunney 在 PR 上有提到他改變 model 檔案格式,以便改用 mmap(),大幅降低了需要預先讀取的時間 (因為變成 lazy-loading style),而且這也讓系統可以利用 cache page,避免了 double buffering 的問題:

This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache pages directly accessible to our inference processes; and secondly, that the file cache pages are much less likely to get evicted (which would force loads to hit disk) because they're no longer competing with memory pages that were needlessly created by gigabytes of standard i/o.

這讓我想到在資料庫領域中,PostgreSQL 也會用 mmap() 操作,有點類似的概念。

另外 Justine Tunney 在這邊的 comment 有提到一個意外觀察到的現象,他發現實際在計算的時候用到的 model 內容意外的少:他用一個簡單的 prompt 測試,發現 20GB 的 30B model 檔案在他的 Intel 機器上實際只用到了 1.6GB 左右:

If I run 30B on my Intel machine:


As we can see, 400k page faults happen, which means only 1.6 gigabytes ((411522 * 4096) / (1024 * 1024)) of the 20 gigabyte weights file actually needed to be used.

這點他還在懷疑是不是他的修改有 bug,但目前他覺得不太像,也看不太出來:

Now, since my change is so new, it's possible my theory is wrong and this is just a bug. I don't actually understand the inner workings of LLaMA 30B well enough to know why it's sparse. Maybe we made some kind of rare mistake where llama.cpp is somehow evaluating 30B as though it were the 7B model. Anything's possible, however I don't think it's likely. I was pretty careful in writing this change, to compare the deterministic output of the LLaMA model, before and after the Git commit occurred. I haven't however actually found the time to reconcile the output of LLaMA C++ with something like PyTorch. It'd be great if someone could help with that, and possibly help us know why, from more a data science (rather than systems engineering perspective) why 30B is sparse.

如果不是 bug 的話,這其實冒出了一個很有趣的訊號,表示這些 model 是有可能再瘦身的?

Alpaca.cpp 有 13B 與 30B 的 model 可以玩了

在「Stanford Alpaca 與 Alpaca.cpp」這邊有提到 Alpaca 有放出 7B 的 model 可以用,後來 community 也照著同樣的方式訓練出對應的 13B 與 30B 的 model 了,在 Alpaca.cpp 專案的 README.md 有寫怎麼取得。

看起來比較大的限制是記憶體,13B 與 30B 的記憶體建議是 >10GB RAM 與 >32GB RAM,不知道剛好 32GB RAM 跑不跑得動,先拿 32GB RAM 的機器跑看看,如果真的不行的話家裡桌機是 64GB RAM,應該是不會有大問題...

不過 13B 的 model 檔案位置是放在 IPFS 上,透過這幾個 gateway 下載起來都有點慢,有需要的人用 BitTorrent 可能會快一點?

Update:測了一下 13B 的版本:

> Explain FAANG.
FAANG refers to five of America's largest technology companies - Facebook, Apple Inc., Amazon, Netflix and Google (Alphabet). These are some of the most valuable brands in today’s marketplace with a combined value that exceeded $3 trillion as at 2018.

以及 30B 的版本:

> Explain FAANG.
FAANG stands for Facebook, Amazon, Apple, Netflix and Google - five of the most powerful technology companies in today's world. These tech giants have become increasingly influential over recent years due to their dominance in various markets such as social media platforms (Facebook), e-commerce websites (Amazon) or streaming services (Netflix).

Fake GitHub Star 的生意

昨天在 Hacker News 首頁上看到「Tracking the Fake GitHub Star Black Market (dagster.io)」這篇,原文在「Tracking the Fake GitHub Star Black Market with Dagster, dbt and BigQuery」這邊。

作者群想要偵測 GitHub 上面 fake star 的行為,所以就跑去找黑市買,然後找到了兩家,Baddhi Shop (1000 個 $64) 與 GitHub24 (每個 €0.85,大約是 $0.91),價錢差異很大,「品質」差異也很大:貴的 star 在一個月後還是存在,而便宜的看起來有一些有被 GitHub 偵測到而清除掉:

A month later, all 100 GitHub24 stars still stood, but only three-quarters of the fake Baddhi Shop stars remained. We suspect the rest were purged by GitHub’s integrity teams.

接下來就是想要系統化分析,切入點是 GH Archive 這個服務,可以直接下載 GitHub 全站上的 public evnets 資料:

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

想要偵測兩種不同的 fake account,第一種是 obvious fake account,定義成這樣:

  • Created in 2022 or later
  • Followers <=1
  • Following <= 1
  • Public gists == 0
  • Public repos <=4
  • Email, hireable, bio, blog, and twitter username are empty
  • Star date == account creation date == account updated date

從定義就可以看出來完全就是灌水帳號,開出來就沒在動的。從 screenshot 可以看出這種帳號長的都一樣:

另外一種則是透過演算法去分析,這邊拿 unsupervised clustering 類的演算法分析出來的結果,可以看到抓到很多:

最近 NN 類的 machine learning 演算法太多,看到這些傳統的 machine learning 演算法還是覺得頗新鮮的...

Stanford Alpaca 與 Alpaca.cpp

先前提到的「玩最近 Facebook Research (Meta) 放出來的 LLaMA」,在 2/24 公佈後,過沒多久 Stanford CRFM 團隊就丟出 fine-tuned 過的版本:「Alpaca: A Strong, Replicable Instruction-Following Model」,對應的討論在「Alpaca: A strong open-source instruction-following model (stanford.edu)」。

這個 tuning 是基於 LLaMA-7B:

We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta’s LLaMA 7B model. We train the Alpaca model on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003. On the self-instruct evaluation set, Alpaca shows many behaviors similar to OpenAI’s text-davinci-003, but is also surprisingly small and easy/cheap to reproduce.

後來就再被包成一個專案 (改自 llama.cpp),讓大家可以很方便的跑:「Alpaca.cpp」。

剛好可以比較,用 llama.cpp 跑的話是:

$ ./main -m ./models/7B/ggml-model-q4_0.bin -t 12 -n 128 -p 'Explain FAANG. '

Explain FAANG. 🙂
The acronym “FAAN” is used for the five tech giants (Facebook, Apple, Amazon and Netflix) that are
 taking over our lives:
15 November 2018 / by Amaraksha S [end of text]

而用 alpaca.cpp 跑的話回答品質好不少:

> Explain FAANG.
FAANG (or FANNG) refers to five technology companies, Facebook Inc., Apple Inc., Amazon.com Inc., Netflix Inc and Alphabet's Google division that have become some of the most influential in recent years due to their rapid growth rates as well as their market capitalization values: 
Facebook (FB), $596 billion; Apple (AAPL) ,$1,084.72 billion; Amazon (AMZN), $394.5 billion ; Netflix( NFLX ), $154.8 billion and Google's parent company Alphabet Inc., ($GOOG,$GOOGL).
These companies have become some of the most influential in recent years due to their rapid growth rates as well as their market capitalization values:  Facebook (FB),$307.15 billion; Apple(AAPL) , $924.86 billion ; Amazon (AMZN ), $783.

可以看到 instruction-following 後的差異,很像是 GPT-3.5 API 與 ChatGPT API 的輸出結果差異。

而這只是 LLaMA-7B 的成果,好像可以期待有人對 LLaMA-65B 做些事情?

玩最近 Facebook Research (Meta) 放出來的 LLaMA

很多地方應該都有提到 Facebook Research (Meta) 放出來的 LLaMA 了,對應的論文是「LLaMA: Open and Efficient Foundation Language Models」這篇,但這邊論文提到的 open 並不是一般常見的 open 定義,而只是常見的行銷詞彙而已,實際上只是 free for charging with constraints。

另外要注意 LLaMA 是個 LLM 而已,跟 ChatGPT 不算是同樣性質的東西,能對比應該是 GPT-3 (或是 GPT-3.5)。

主要是 ChatGPT 多了 SLRL 的步驟,而產出來的東西更接近商業化產品要的結果。

LLaMA 的特點在於效能不錯,可以用 LLaMA-13B 打贏 GPT-3 (175B),另外這次訓練出來最大的 LLaMA-65B 則可以站上第一梯隊 (與 DeepMindChinchilla-70BGoogle ResearchPaLM-540B):

LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B.

但跟以前差異最大的是,這次 Facebook Research 決定把訓練完後的 model 檔案放出來,所以就有了後續很多的進展:

We release all our models to the research community.

首先一開始 Facebook Research 要求使用者填表單才提供下載 (2/24 的時候),但三月初的時候 GitHub 上有人直接把 BitTorrentmagnet 連結附上去,送了一個 pull request:「Save bandwidth by using a torrent to distribute more efficiently #73」,所以你就有「方法」可以取得 model 檔案,但還是可以注意一下使用限制:

To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license focused on research use cases. Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world. People interested in applying for access can find the link to the application in our research paper.

除了可以透過 BitTorrent 下載外,comment 裡面也有 IPFS 的連結可以下載。

有需要下載這包檔案的人要注意檔案很大,大約 240GB,其中 65B model 佔了 128GB 左右。

被放出來以後就開始有很多人在上面包起來用,其中目前比較完整的應該是「Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp」這篇提到的方法,裡面提到的 ggerganov/llama.cpp 專案同時支援了 WindowsmacOSLinux,而且可以用 CPU 跑,速度也不慢。

試著用 llama.cpp 跑,65B 的 model 在家裡桌機有 64GB RAM 的情況下是可以應硬扛的,跑降到 4bits 的 model 大約吃 41GB RAM 左右。

比較驚訝的是 efficient 這個部份,拿 CPU 跑 65B 版本是跑得動的。

隨便丟個「文字接龍」進去 LLaMA-65B 讓他接,可以看到接出來的東西還可以:

main: prompt: 'The main difference between javascript and java are '
main: number of tokens in prompt = 10

The main difference between javascript and java are 1)Java is a compiled language while JavaScript
is not. So that the performance of Javascript code can be lower than Java Code, but they run on different platforms: browser vs server or local machine;
2)'javac' to compile .java file to bytecode(for JVM), and 'jar'(to make executable jar files for client systems).
3)Java uses more memory (RAM) that javascript. So the Javascript code is generally small than Java
codes.(A web application written in JavaScript may run faster on a PC with 64 MB of RAM compared to one running using applets.) While an advantage offered by J

如果是 LLaMA-7B 的話會快很多,但回答就有點微妙了:

main: prompt: 'The main difference between javascript and java are '
main: number of tokens in prompt = 10

The main difference between javascript and java are 1) Javascript is interpreted, while Java compiles to native machine code. (Compiled languages have a performance advantage over Interpreted ones
2) The language standardization process for JavaScript has not been completed at the time of this writing which leaves it open to being hacked or changed by those who care enough about your website to do so... In Java, everything is set in stone. (Or at least as much code that can be shipped on a CD-ROM disk)
Sorry if my response was not clear - but you're right! I think that "2" above should really have said: "(

訓練所花的資源的部份,可以從論文裡面看到,如果是 2048 張 A100 的話大約要跑五個月 (照這個語氣,實際上大概不是這個數字):

Finally, we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models.

另外也有列出 GPU hours 可以參考:

Kagi 開發的 Universal Summarizer

在「Universal Summarizer (kagi.com)」這邊看到的新服務,可以給出某個網址的 summary。服務的本體則是在「Kagi - Universal Summarizer」這邊,從網址可以猜測是 Kagi 的實驗項目。

像是 CNN 的「Biden’s dramatic warning to China」這篇,他抓出對應的 summary 看起來沒什麼問題:

President Joe Biden delivered a dramatic warning to China in his State of the Union address, vowing to protect America's sovereignty if China threatens it. He specifically named President Xi Jinping, saying "Name me a world leader who'd change places with Xi Jinping. Name me one!" This marked a stark escalation in the US-Chinese relationship, which has been strained by a balloon surveillance program and other issues. Biden also addressed Russia, calling their invasion of Ukraine a test for America and the world. His speech highlighted the unified opposition to China in US politics, with House Speaker Kevin McCarthy having convened a bipartisan House committee to examine the perceived threat from the Chinese Communist Party. Biden's comments also served as an important milestone in the increasingly tumultuous competition between the US and China, as the US shifts to talking about establishing guardrails for the relationship and protecting the Western-led rules-based international system.

這個功能可以用 GPT-3.5 或是 ChatGPT 串,但不確定 Kagi 是串上去還自己搞?我猜有蠻大的機會是串的...

然後他的網址設計因為是 url 傳遞的方式,可以包裝成 bookmarklet 形式放在快速列上面用:「Kagi Universal Summarizer」。

另外在 HN 的討論裡面看到很厲害的用法,把 https://bellard.org/quickjs/pi_bigdecimal.js 這段 javascript code 的 url 丟進去,然後居然出現了程式碼的說明,居然還正確判斷出是 Chudnovsky algorithm

This code uses the QuickJS bigdecimal type to calculate the value of pi to a given precision. It does this by using the Chudnovsky algorithm, which is a series of calculations that can be used to approximate pi. The code is written in Javascript and uses BigInt and BigDecimal to perform the calculations. It is interesting to note that the code also takes into account the possibility of bad rounding for the last digits, and adds extra digits to reduce the probability of this happening.

既然包了一個 bookmarklet,最近應該會常常拿出來用...

用 AI 模型判斷是否為 AI 產生的文字

OpenAI 放出了新的 model,可以用來判斷是否為 AI 產生的文字:「New AI classifier for indicating AI-written text」。

但目前的成效其實還是不太行,只以英文的成效來看,true positive 只有 26%,而 false positive 是 9%:

In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives).


The classifier is very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.


Text that is very predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or humans, because the correct answer is always the same.

另外題到了技術上的限制,現在的方法比較像是「辨認是不是從某些 corpus 訓練出來的 model,所產生的文字」,而非通用性的 AI 文字偵測:

Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction.


OpenAI 推出 ChatGPT Plus

OpenAI 提出了 ChatGPT 的付費方案:「Introducing ChatGPT Plus」。


ChatGPT Plus is available to customers in the United States, and we will begin the process of inviting people from our waitlist over the coming weeks. We plan to expand access and support to additional countries and regions soon.

公告的價錢是 US$20/mo,基本上就是保證使用權。這跟之前有傳言 US$42/mo 叫 Professional 的方案低了不少:「ChatGPT users report $42 a month pricing for ‘pro’ access but no official announcement yet」:

The new subscription plan, ChatGPT Plus, will be available for $20/month, and subscribers will receive a number of benefits:

  • General access to ChatGPT, even during peak times
  • Faster response times
  • Priority access to new features and improvements

應該是會訂起來用,光是現在 free tier 就已經找到一些常用的模式,可以省下不少時間...

直接用 prompt 產生音樂的 Riffusion

很紅的 Stable Diffusion 是寫一串文字 (prompt) 然後產生圖片,而 Riffusion 則是寫一串文字產生音樂。

其中 prompt 轉成音樂其實還在可以預期的範圍 (i.e. 遲早會出現),但專案的頁面上解釋了 Riffusion 是基於 Stable Fusion 的作品,而且是利用 Stable Fusion 產生出時頻譜 (spectrogram):

Well, we fine-tuned the model to generate images of spectrograms, like this:


Hacker News 上討論時的討論頁可以看看,作者有參與一些討論:「Riffusion – Stable Diffusion fine-tuned to generate music (riffusion.com)」。

其中有人提到這個作法超出想像,因為輸出的圖片只要幾個 pixel 差一點點就會產生出很不同的聲音:

This really is unreasonably effective. Spectrograms are a lot less forgiving of minor errors than a painting. Move a brush stroke up or down a few pixels, you probably won't notice. Move a spectral element up or down a bit and you have a completely different sound. I don't understand how this can possibly be precise enough to generate anything close to a cohesive output.

Absolutely blows my mind.


Author here: We were blown away too. This project started with a question in our minds about whether it was even possible for the stable diffusion model architecture to output something with the level of fidelity needed for the resulting audio to sound reasonable.

實際上聽了產生出來的音樂,是真的還 OK 的音樂... 大家都完全沒想到可以這樣搞,然後在 Hacker News 上的 upvote 數量爆炸高 XD


Hacker News Daily 上看到 Palette 這個服務,作者在 Hacker News 上有提到你可以提供一些句子調整顏色:「Show HN: I made a new AI colorizer (palette.fm)」。

Hi HN, I’m Emil, the maker behind Palette. I’ve been tinkering with AI and colorization for about five years. This is my latest colorization model. It’s a text-based AI colorizer, so you can edit the colorizations with natural language. To make it easier to use, I also automatically create captions and generate filters.

作者有把一些作品貼在 Reddit 上面,可以參考 https://www.reddit.com/user/emilwallner/?sort=top 這邊,看起來已經有一陣子了...