If you use zsh and have trouble running llamafile, try saying sh -c ./llamafile. This is due to a bug that was fixed in zsh 5.9+. The same is the case for Python subprocess, old versions of Fish, etc.
On Linux, Nvidia cuBLAS GPU support will be compiled on the fly if (1) you have the cc compiler installed, (2) you pass the --n-gpu-layers 35 flag (or whatever value is appropriate) to enable GPU, and (3) the CUDA developer toolkit is installed on your machine and the nvcc compiler is on your path.
但可以看到沒有被 offload 到 GPU 上面:
llm_load_tensors: ggml ctx size = 0.11 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 4165.47 MB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0.00 MB
嘗試了不同的方法,發現要跑 sh -c "./llamafile --n-gpu-layers 35",也就是把參數一起包進去,這樣就會出現對應的 offload 資訊,而且輸出也快很多:
llm_load_tensors: ggml ctx size = 0.11 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 70.42 MB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 35/35 layers to GPU
llm_load_tensors: VRAM used: 4095.05 MB
For a long time Firefox Desktop development has supported both Mercurial and
Git users. This dual SCM requirement places a significant burden on teams which
are already stretched thin in parts. We have made the decision to move Firefox
development to Git.
Firefox Translations provides automated translation of web content. Unlike cloud-based alternatives, translation is done locally, on the client-side, so that the text being translated does not leave your machine.
I've just installed it, and I'm impressed so far. I've only run it against some sample German Wikipedia articles (https://de.wikipedia.org/wiki/Clan_of_Xymox), but it produces surprisingly readable text. I also particularly like the "highlight potential errors" option to flag stuff that even the translation service thinks might be a bit off.
It's not nearly as speedy as Google Translate, but I'll take that happily if it means keeping it local.
Firefox Translations was developed with The Bergamot Project Consortium, coordinated by the University of Edinburgh with partners Charles University in Prague, the University of Sheffield, University of Tartu, and Mozilla. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825303.
不過比較好奇的是在頁面上有提到 CPU 需要 SSE4.1 能力... 這樣就有兩個問題了,第一個是 browser extension 可以直接跑 SSE4.1 指令集?另外一個疑問就是,Apple 的 ARM 架構就無法支援嗎 (應該也有類似的加速指令集),現在是 x86 限定?
A CPU that supports SSE4.1 extensions is required for this addon to function properly. If it doesn't, an error will be displayed when the translation is being started.
It turns out that Chrome actively throttles requests, including those to cached resources, to reduce I/O contention. This generally improves performance, but will mean that pages with a large number of cached resources will see a slower retrieval time for each resource.