Hacker News Daily 上看到「Redditor creates working anime QR codes using Stable Diffusion」這個,Reddit 的原討論串在「ControlNet for QR Code」這邊。
透過 ControlNet 產生出來的 QR code 超出一般對客製化 QR code 的想像... 這邊就不解釋直接上圖了:
幹壞事是進步最大的原動力
Hacker News Daily 上看到「Redditor creates working anime QR codes using Stable Diffusion」這個,Reddit 的原討論串在「ControlNet for QR Code」這邊。
透過 ControlNet 產生出來的 QR code 超出一般對客製化 QR code 的想像... 這邊就不解釋直接上圖了:
Hacker News 首頁上看到 Georgi Gerganov 成立公司的計畫:「GGML – AI at the Edge (ggml.ai)」,官網在「GGML - AI at the edge」。
如同 Georgi Gerganov 提到的,llama.cpp 這些專案本來是他的 side project,結果意外的紅起來:
I've started a company: https://t.co/jFknDoasSy
From a fun side project just a few months ago, ggml has now become a useful library and framework for machine learning with a great open-source community
— Georgi Gerganov (@ggerganov) June 6, 2023
另外他提到了 Nat Friedman 與 Daniel Gross 也幫了一把:
I'm incredibly grateful to @natfriedman and @danielgross for the support & funding and also for helping me get inspired even more in this project
There is still a long way ahead with many ideas to try and cool things to do. Hope you will join and help us create something useful!
— Georgi Gerganov (@ggerganov) June 6, 2023
在官網則是有提到是 pre-seed funding:
ggml.ai is a company founded by Georgi Gerganov to support the development of ggml. Nat Friedman and Daniel Gross provided the pre-seed funding.
現在回頭來看,當初 llama.cpp 會紅起來主要是因為 CPU 可以跑 LLaMA 7B,而且用 CPU 跑起來其實也不算慢。
後來吸引了很多人一起幫忙,於是有了不少 optimization (像是「llama.cpp 的載入速度加速」這邊用 mmap 減少需要載入的時間,並且讓多個 process 之間可以重複使用 cache),接下來又有 GPU 的支援...
但不確定他開公司後,長遠的計畫是什麼...?
在 LLM 裡面講的 Open 不是 open-source license 的定義,比較接近「免費使用」而已,通常會帶有限制。
但即使放寬到「免費使用」,LLaMA 65B 從二月放出來 (或者說「被放出來」) 已經領頭領了三個多月了,直到上個禮拜看到被 Falcon 40B 超越的消息:
LLaMa is dethroned 👑 A brand new LLM is topping the Open Leaderboard: Falcon 40B 🛩
*interesting* specs:
- tuned for efficient inference
- licence similar to Unity allowing commercial use
- strong performances
- high-quality dataset also releasedCheck the authors' thread 👇 https://t.co/vojobBXFQT pic.twitter.com/BuOLnHebhU
— Thomas Wolf (@Thom_Wolf) May 26, 2023
在「Open LLM Leaderboard」這邊的 benchmark 可以看到除了 TruthfulQA (0-shot) 以外,其他的都領先,而綜合平均值也是領先的:
而往下拉可以看到 7B 的版本表現也不錯,之後應該也可以再 tune。
更重要的是,剛剛看到這個 model 把授權改成 Apache License 2.0 的消息,這所以 LLaMA 的替代方案總算有樣子了:
The license of the Falcon 40B model has just been changed to… Apache-2 which means that this model is now free for any usage including commercial use (and same for the 7B) 🎉 https://t.co/LZcmejPdf5
— Thomas Wolf (@Thom_Wolf) May 31, 2023
另外看了一下,這包 model 是在 AWS 的 SageMaker 上面幹出來的,翻了一下 Technology Innovation Institute,真不愧是有錢的單位:
Falcon-40B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances.
The Technology Innovation Institute (TII) is an Abu Dhabi government funded research institution that operates in the areas of artificial intelligence, quantum computing, autonomous robotics, cryptography, advanced materials, digital science,[4] directed energy and secure systems. The institute is a part of the Abu Dhabi Government’s Advanced Technology Research Council (ATRC).
在 Hacker News 上有人已經跑起來了,而且是透過 InstructGPT 調教過的版本:「Falcon 40B LLM (which beats Llama) now Apache 2.0 (twitter.com/thom_wolf)」,據說 4-bit quantized 版本可以在 40GB 的 A100 或是兩張 24GB 的 3090/4090 跑起來。
另外 ggml 的人應該這幾天就會動起來了,可以讓子彈再放著飛一下...
Hacker News Daily 上看到的文章,講 LLM 的各種數字 (大多都是費用):「Numbers every LLM developer should know (github.com/ray-project)」,原文在「Numbers every LLM Developer should know」這邊。
其中第一條就蠻重要的,如果你是用 API 依照 token 收費的話,叫 API 長話短說會省不少錢 XD
40-90: Amount saved by appending “Be Concise” to your prompt
第二條是給個感覺,換算 word 與 token,不過這邊講的應該是英文的:
1.3:1 -- Average tokens per word
後面也有蠻多數字的,都是讓你有個感覺。都讀過後就可以把 cheatsheet 留下來:
在 Simon Willison 這邊看到的玩法,透過 WebGPU 在瀏覽器上面直接跑 LLM 的 demo:「Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it’s very impressive」,專案在「Web LLM」這邊,可以直接玩。
不過要注意一下瀏覽器的支援度,如果是 Chrome 的話需要 113+,但目前 stable 還是 112;而 Firefox 的話我試過在 about:config
裡面用 dom.webgpu.enabled
打開 WebGPU 支援,但重開瀏覽器後還是跑不動?(也有可能是 Linux 環境的關係)
Update:應該是 Linux 環境的關係,我在 Linux 下用 dev channel (114) 也不行。
話說有 WebGPU 後是不是開始要擋 GPU 挖礦了...
在 Ask Hacker News Weekly 上看到的討論,有人問了目前可商用的 LLM 有哪些:「Ask HN: Open source LLM for commercial use?」。
有人提到 Google 的 Flan 應該是目前最能打的?在 Hugging Face 上可以下載到:
I've seen this question asked repeatedly in many LLaMa threads, currently the best models that are truly open are the released models from the Flan family by Google, which includes Flan-T5[0] and Flan-UL2[1]. According to its paper, Flan-UL2 performs slightly better than Flan-T5-XXL.
然後差不多是 GPT-3 的等級,離 GPT-3.5 或是演伸出來的 ChatGPT 都還有段距離。但如果針對特定情境下 tune 的話應該還是能用的:
These models perform slightly better than GPT-3 under some tasks[2], but they're still far from achieving the results from GPT-3.5 and GPT-4. This becomes evident when you try to use them in the real world; they're not "good enough" for general use cases, unlike ChatGPT models. However, if you can restrict your use case to one particular domain, you can achieve pretty good results by further fine-tuning these models.
另外一則回覆有提到一些其他的 model:
The ones I saw mentioned so far were Flan, Cerebras, GPT-J, and RWKV.
Not yet mentioned:
* Pythia https://github.com/EleutherAI/pythia
* GLM-130B https://github.com/THUDM/GLM-130B - see also ChatGLM-6B https://github.com/THUDM/ChatGLM-6B
* GPT-NeoX-20B https://huggingface.co/EleutherAI/gpt-neox-20b
* GeoV-9B https://github.com/geov-ai/geov
* BLOOM https://huggingface.co/bigscience/bloom and BLOOMZ https://huggingface.co/bigscience/bloomz
看起來如果有需要用的話是可以從這裡面挖看看...
前幾天不知道在哪邊看到「Five years of GPT progress」這篇,裡面整理了這五年 GPT/LLM 的進程,算是回顧性質的文章,裡面當然有提到技術改善的地方 (像是參數大小,類神經網路層的架構差異),另外裡面都有原始論文或是資料的連結,然後作者也有描述一些當時的背景,對於要釐清歷史脈絡也蠻有幫助的。
從 GPT、GPT-2、GPT-3 這三個 OpenAI 的作品開始講,然後提到 GPT-3 帶出來的新紀元。
接著提到的是各家都開始進來參與的年代,Jurassic-1 (AI21 Labs)、Megatron-Turing NLG (Nvidia)、Gopher (DeepMind)、Chinchilla (DeepMind)、PaLM (Google AI)。
然後是 LLaMa (Facebook),第一個有參數夠大,而且效能夠好的 model,被放出來讓大家玩的 LLM。
最後又回到 OpenAI 的 GPT-4。
這樣整理讀起來清晰不少,但要注意裡面的發展不是線性關係,彼此之間互相影響交錯在跑 (因為中間還是有很多其他的論文互相影響)。
在 Hacker News 上看到「Llama.cpp 30B runs with only 6GB of RAM now (github.com/ggerganov)」這個消息,原 pull request 在「Make loading weights 10-100x faster #613」這邊。
這個 PR 的作者 Justine Tunney 在 PR 上有提到他改變 model 檔案格式,以便改用 mmap(),大幅降低了需要預先讀取的時間 (因為變成 lazy-loading style),而且這也讓系統可以利用 cache page,避免了 double buffering 的問題:
This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache pages directly accessible to our inference processes; and secondly, that the file cache pages are much less likely to get evicted (which would force loads to hit disk) because they're no longer competing with memory pages that were needlessly created by gigabytes of standard i/o.
這讓我想到在資料庫領域中,PostgreSQL 也會用 mmap() 操作,有點類似的概念。
另外 Justine Tunney 在這邊的 comment 有提到一個意外觀察到的現象,他發現實際在計算的時候用到的 model 內容意外的少:他用一個簡單的 prompt 測試,發現 20GB 的 30B model 檔案在他的 Intel 機器上實際只用到了 1.6GB 左右:
If I run 30B on my Intel machine:
[...]
As we can see, 400k page faults happen, which means only 1.6 gigabytes ((411522 * 4096) / (1024 * 1024)) of the 20 gigabyte weights file actually needed to be used.
這點他還在懷疑是不是他的修改有 bug,但目前他覺得不太像,也看不太出來:
Now, since my change is so new, it's possible my theory is wrong and this is just a bug. I don't actually understand the inner workings of LLaMA 30B well enough to know why it's sparse. Maybe we made some kind of rare mistake where llama.cpp is somehow evaluating 30B as though it were the 7B model. Anything's possible, however I don't think it's likely. I was pretty careful in writing this change, to compare the deterministic output of the LLaMA model, before and after the Git commit occurred. I haven't however actually found the time to reconcile the output of LLaMA C++ with something like PyTorch. It'd be great if someone could help with that, and possibly help us know why, from more a data science (rather than systems engineering perspective) why 30B is sparse.
如果不是 bug 的話,這其實冒出了一個很有趣的訊號,表示這些 model 是有可能再瘦身的?
在「Stanford Alpaca 與 Alpaca.cpp」這邊有提到 Alpaca 有放出 7B 的 model 可以用,後來 community 也照著同樣的方式訓練出對應的 13B 與 30B 的 model 了,在 Alpaca.cpp 專案的 README.md 有寫怎麼取得。
看起來比較大的限制是記憶體,13B 與 30B 的記憶體建議是 >10GB RAM 與 >32GB RAM,不知道剛好 32GB RAM 跑不跑得動,先拿 32GB RAM 的機器跑看看,如果真的不行的話家裡桌機是 64GB RAM,應該是不會有大問題...
不過 13B 的 model 檔案位置是放在 IPFS 上,透過這幾個 gateway 下載起來都有點慢,有需要的人用 BitTorrent 可能會快一點?
Update:測了一下 13B 的版本:
> Explain FAANG.
FAANG refers to five of America's largest technology companies - Facebook, Apple Inc., Amazon, Netflix and Google (Alphabet). These are some of the most valuable brands in today’s marketplace with a combined value that exceeded $3 trillion as at 2018.
以及 30B 的版本:
> Explain FAANG.
FAANG stands for Facebook, Amazon, Apple, Netflix and Google - five of the most powerful technology companies in today's world. These tech giants have become increasingly influential over recent years due to their dominance in various markets such as social media platforms (Facebook), e-commerce websites (Amazon) or streaming services (Netflix).
昨天在 Hacker News 首頁上看到「Tracking the Fake GitHub Star Black Market (dagster.io)」這篇,原文在「Tracking the Fake GitHub Star Black Market with Dagster, dbt and BigQuery」這邊。
作者群想要偵測 GitHub 上面 fake star 的行為,所以就跑去找黑市買,然後找到了兩家,Baddhi Shop (1000 個 $64) 與 GitHub24 (每個 €0.85,大約是 $0.91),價錢差異很大,「品質」差異也很大:貴的 star 在一個月後還是存在,而便宜的看起來有一些有被 GitHub 偵測到而清除掉:
A month later, all 100 GitHub24 stars still stood, but only three-quarters of the fake Baddhi Shop stars remained. We suspect the rest were purged by GitHub’s integrity teams.
接下來就是想要系統化分析,切入點是 GH Archive 這個服務,可以直接下載 GitHub 全站上的 public evnets 資料:
GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
想要偵測兩種不同的 fake account,第一種是 obvious fake account,定義成這樣:
從定義就可以看出來完全就是灌水帳號,開出來就沒在動的。從 screenshot 可以看出這種帳號長的都一樣:
另外一種則是透過演算法去分析,這邊拿 unsupervised clustering 類的演算法分析出來的結果,可以看到抓到很多:
最近 NN 類的 machine learning 演算法太多,看到這些傳統的 machine learning 演算法還是覺得頗新鮮的...