language – Page 3 – Gea-Suan Lin's BLOG

Node.js 20

看到 Node.js 推出 20 了，官方的公告：「Node.js 20 is now available!」。

裡面提到的 Permission Model，設計上看起來有點雷？這種東西應該要有白名單機制才對，目前看起來是實做黑名單機制...

然後結尾有提到 14 是這個月收攤，16 則是因為 OpenSSL 1.1.1 EoL，打算切齊而提前到今年九月收 (參考 OpenSSL 官方前陣子發的「OpenSSL 1.1.1 End of Life」)：

Also of note is that Node.js 14 will go End-of-Life in April 2023, so we advise you to start planning to upgrade to Node.js 18 (LTS) or Node.js 20 (soon to be LTS).

Please, consider that Node.js 16 (LTS) will go End-of-Life in September 2023, which was brought forward from April 2024 to coincide with the end of support of OpenSSL 1.1.1.

查了 18 會是 2025 年四月底，20 則會是 2026 年四月底...

透過 WebGPU 跑的 Web LLM

在 Simon Willison 這邊看到的玩法，透過 WebGPU 在瀏覽器上面直接跑 LLM 的 demo：「Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it’s very impressive」，專案在「Web LLM」這邊，可以直接玩。

不過要注意一下瀏覽器的支援度，如果是 Chrome 的話需要 113+，但目前 stable 還是 112；而 Firefox 的話我試過在 about:config 裡面用 dom.webgpu.enabled 打開 WebGPU 支援，但重開瀏覽器後還是跑不動？(也有可能是 Linux 環境的關係)

Update：應該是 Linux 環境的關係，我在 Linux 下用 dev channel (114) 也不行。

話說有 WebGPU 後是不是開始要擋 GPU 挖礦了...

目前可商用的 LLM

在 Ask Hacker News Weekly 上看到的討論，有人問了目前可商用的 LLM 有哪些：「Ask HN: Open source LLM for commercial use?」。

有人提到 Google 的 Flan 應該是目前最能打的？在 Hugging Face 上可以下載到：

I've seen this question asked repeatedly in many LLaMa threads, currently the best models that are truly open are the released models from the Flan family by Google, which includes Flan-T5[0] and Flan-UL2[1]. According to its paper, Flan-UL2 performs slightly better than Flan-T5-XXL.

然後差不多是 GPT-3 的等級，離 GPT-3.5 或是演伸出來的 ChatGPT 都還有段距離。但如果針對特定情境下 tune 的話應該還是能用的：

These models perform slightly better than GPT-3 under some tasks[2], but they're still far from achieving the results from GPT-3.5 and GPT-4. This becomes evident when you try to use them in the real world; they're not "good enough" for general use cases, unlike ChatGPT models. However, if you can restrict your use case to one particular domain, you can achieve pretty good results by further fine-tuning these models.

另外一則回覆有提到一些其他的 model：

The ones I saw mentioned so far were Flan, Cerebras, GPT-J, and RWKV.

Not yet mentioned:

* Pythia https://github.com/EleutherAI/pythia

* GLM-130B https://github.com/THUDM/GLM-130B - see also ChatGLM-6B https://github.com/THUDM/ChatGLM-6B

* GPT-NeoX-20B https://huggingface.co/EleutherAI/gpt-neox-20b

* GeoV-9B https://github.com/geov-ai/geov

* BLOOM https://huggingface.co/bigscience/bloom and BLOOMZ https://huggingface.co/bigscience/bloomz

看起來如果有需要用的話是可以從這裡面挖看看...

GPT 的進程 (或是 LLM 的進程)

前幾天不知道在哪邊看到「Five years of GPT progress」這篇，裡面整理了這五年 GPT/LLM 的進程，算是回顧性質的文章，裡面當然有提到技術改善的地方 (像是參數大小，類神經網路層的架構差異)，另外裡面都有原始論文或是資料的連結，然後作者也有描述一些當時的背景，對於要釐清歷史脈絡也蠻有幫助的。

從 GPT、GPT-2、GPT-3 這三個 OpenAI 的作品開始講，然後提到 GPT-3 帶出來的新紀元。

接著提到的是各家都開始進來參與的年代，Jurassic-1 (AI21 Labs)、Megatron-Turing NLG (Nvidia)、Gopher (DeepMind)、Chinchilla (DeepMind)、PaLM (Google AI)。

然後是 LLaMa (Facebook)，第一個有參數夠大，而且效能夠好的 model，被放出來讓大家玩的 LLM。

最後又回到 OpenAI 的 GPT-4。

這樣整理讀起來清晰不少，但要注意裡面的發展不是線性關係，彼此之間互相影響交錯在跑 (因為中間還是有很多其他的論文互相影響)。

前陣子 Hacker News 很慢的一些背景知識

看到 Ask HN: Is Hacker News slow for anyone else? 這邊的討論，dang (Hacker News 的管理員) 在 35157344 這邊就有出來說明：

All: our poor server is smoking today* so I've had to reduce the page size of comments. There are 1500+ comments in this thread but if you want to read more than a few dozen you'll need to page through them by clicking the More link at the bottom. I apologize!

Also, if you're cool with read-only access, just log out (edit: or use an incognito tab) and all will be fast again.

* yes, HN still runs on one core, at least the part that serves logged-in requests, and yes this will all get better someday...it kills me that this isn't done yet but one day you will all see

另外比較特別的是，Hacker News 是用 Arc (Lisp) 寫的，不過看起來沒有考慮到 optimization，加上那天 Reddit 也掛了，的確帶動 Hacker News 這邊更新的頻率比較高...

Ruby 再引入另外一套 JIT 實做：RJIT

在 Hacker News Daily 上看到「RJIT #7448」這個，Ruby 上一套新的 JIT 實做。

這次的 RJIT 取代掉先前的 MJIT：

This PR replaces the current implementation of MJIT with a new JIT called "RJIT"

有些特點，其中一個是 RJIT 在 buildtime 與 runtime 都不需要 compiler，這是因為 RJIT 直接用 Ruby 實做：

RJIT uses a pure-Ruby assembler to generate native code

MJIT requires a C compiler at runtime. YJIT requires a Rust compiler at build time. RJIT doesn't require them.

This means that RJIT's warmup could be slower than YJIT, but it's still much faster than MJIT's.

另外值得注意的是，RJIT 的作者 k0kubun 跟 YJIT 的作者 Maxime Chevalier-Boisvert 都是 Shopify 的員工，可以看出 Shopify 對於 Ruby 效能的痛？決定直接自己養人改善效能。

回到 RJIT 這邊跑的測試，可以看到他是用 YJIT 的測試套件測，這也就不會太奇怪了。

跟這次取代掉的 MJIT 相比，RJIT 在 Headline 這包測試都 OK，在 Other 這包則是有來有回，而在 Micro 這包則是有不少項目輸掉 (相比於前兩者)：

這樣整體看起來算是有進步，下一版 Ruby 更新應該就會有了。

Ruby 3.2.0 把 YJIT 列為穩定功能了

去年有寫過 Ruby 的 YJIT 帶來的效能提昇：「YJIT 帶給 Ruby 大量的效能提昇」。

在這次的 Ruby 3.2.0 發布就把 YJIT 列為穩定功能了：「Ruby 3.2.0」。

YJIT is no longer experimental

Has been tested on production workloads for over a year and proven to be quite stable.

另外就是支援的平台，看起來是多了 arm64 這邊的支援，所以馬上列表就多了一堆新機器：

YJIT now supports both x86-64 and arm64/aarch64 CPUs on Linux, MacOS, BSD and other UNIX platforms.

This release brings support for Apple M1/M2, AWS Graviton, Raspberry Pi 4 and more.

另外是每個程式語言幾乎都會遇到的 regexp 類的問題，這次 Ruby 3.2.0 利用 Memoization 的方式降低某些 regexp 的消耗：

# This match takes 10 sec. in Ruby 3.1, and 0.003 sec. in Ruby 3.2
/^a*b?a*$/ =~ "a" * 50000 + "x"

而另外一組 regexp 也可以看出類似的效果：

用一些記憶體空間換取效能，降低被 DoS 的一些機會。另外一方面，引入了 regexp timeout 的 workaround，緩解真的被打的時候的資源消耗上限：

The optimization above cannot be applied to some kind of regular expressions, such as those including advanced features (e.g., back-references or look-around), or with a huge fixed number of repetitions. As a fallback measure, a timeout feature for Regexp matches is also introduced.

Mozilla 推出在本地端直接翻譯的 Firefox Translations

Mozilla 推出了「Firefox Translations」這個 Firefox 上的套件。

主打的就是 offline 這件事情，保有隱私性：

Firefox Translations provides automated translation of web content. Unlike cloud-based alternatives, translation is done locally, on the client-side, so that the text being translated does not leave your machine.

從 Hacker News 上的討論「Firefox Translations: Translate websites in your browser without using the cloud (addons.mozilla.org)」可以看到有些人有提到效果，雖然沒有像雲端服務的準確，但算是可用：

I've just installed it, and I'm impressed so far. I've only run it against some sample German Wikipedia articles (https://de.wikipedia.org/wiki/Clan_of_Xymox), but it produces surprisingly readable text. I also particularly like the "highlight potential errors" option to flag stuff that even the translation service thinks might be a bit off.

It's not nearly as speedy as Google Translate, but I'll take that happily if it means keeping it local.

從頁面上列出的支援語言可以看出還是以歐美用到的語系為主，然後下方也有說明這個專案是包括其他計畫的贊助累積出來的：

Firefox Translations was developed with The Bergamot Project Consortium, coordinated by the University of Edinburgh with partners Charles University in Prague, the University of Sheffield, University of Tartu, and Mozilla. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825303.

不過比較好奇的是在頁面上有提到 CPU 需要 SSE4.1 能力... 這樣就有兩個問題了，第一個是 browser extension 可以直接跑 SSE4.1 指令集？另外一個疑問就是，Apple 的 ARM 架構就無法支援嗎 (應該也有類似的加速指令集)，現在是 x86 限定？

A CPU that supports SSE4.1 extensions is required for this addon to function properly. If it doesn't, an error will be displayed when the translation is being started.

就算現在限制很多，看起來還是個很有前途的計畫，也許有機會移植到其他瀏覽器上？

C 語言裡面的 ??! 符號

在 Hacker News Daily 上看到這個奇怪的知識：「What does the ??!??! operator do in C? (stackoverflow.com)」，原文在 Stack Overflow 上：「What does the ??!??! operator do in C?」。

這是 trigraph，在 C89 就有了，從 Rationale for International Standard—Programming Languages—C 這邊的 5.2.1.1 可以看到 trigraph 的歷史原因：

Trigraph sequences were introduced in C89 as alternate spellings of some characters to allow the implementation of C in character sets which do not provide a sufficient number of non-alphabetic graphics

而且是強制要求實做：

Implementations are required to support these alternate spellings, even if the character set in use is ASCII, in order to allow transportation of code from systems which must use the trigraphs. AMD1 also added digraphs (see §6.4.6 and §MSE.4).

其中遇到的問題就是當年得決定 C 可以用的 charset，得考慮到很多不同機器 charset 相容性的問題：

The C89 Committee faced a serious problem in trying to define a character set for C. Not all of the character sets in general use have the right number of characters, nor do they support the graphical symbols that C users expect to see. For instance, many character sets for languages other than English resemble ASCII except that codes used for graphic characters in ASCII are instead used for alphabetic characters or diacritical marks. C relies upon a richer set of graphic characters than most other programming languages, so the representation of programs in character sets other than ASCII is a greater problem than for most other programming languages.

然後就使用了 ISO/IEC 646 這個標準 (要記得 Unicode 1.0.0 是 1991 年才出現)：

The solution is an internationally agreed-upon repertoire in terms of which an international representation of C can be defined. ISO has defined such a standard, ISO/IEC 646, which describes an invariant subset of ASCII.

The characters in the ASCII repertoire used by C and absent from the ISO/IEC 646 invariant repertoire are:

[ ] { } \ | ~ ^

後面就是定義 ?? 當作 escape digraph。

算是一個歷史產物，現在不太需要用到了...

Perl 的 Regular Expression 的強度：NP-complete

這篇稍微偏 CS 理論一些...

以前在學校學 Formal language 的時候會帶出 Grammer、Language、Automaton 三個項目，就像是維基百科上的條列：

裡面可以看到經典的 Regular expression 會被分到 RG/RL/FSM 這三塊。

前幾天看到 gugod 寫的「[Perl] 以正規表示式來定義文法規則」這篇，裡面試著用 Perl 的 regular expression (perlre) 建構「遞歸下降解析器」 (Recursive descent parser)。

Recursive descent parser 可以當作是 CFG 的子集合，而 CFG 對應到的語言是 CFL，另外他對應到的自動機是 PDA。

我們已經知道 perlre 因為支援一堆奇怪的東西 (像是 backreference 或是 recursive pattern)，所以他能接受的 language 已經超過 RL，但我很好奇他能夠做到什麼程度。

用搜尋引擎翻了翻，查到對 PCRE 的分析 (這是一套與 Perl regular expression 語法相容的 library)：「Which languages do Perl-compatible regular expressions recognize?」。

在裡面有人提到「The true power of regular expressions」這篇文章，裡面給了一個在 PTIME 演算法，將 3SAT 轉換到 PCRE 裡解，這證明了 PCRE 是 NP-hard；另外也很容易確認 PCRE 是 NP，所以就達成了 NP-complete 的條件了...

本來一直以為 PCRE 只是 CFG/CFL/PDA 而已，沒想到這麼強，NPC 表示大多數現有的演算法都可以轉成 PCRE 形式放進去跑... XD