Python 內建的工具

Hacker News Daily 上看到「CLI tools hidden in the Python standard library」這個,在講 Python 內建的工具。

其中 python -m calendar 這組看起來還不錯,測了一下可以用 python -m calendar 2024 顯示所有 2024 的月曆,用 python -m calendar 2024 1 則可以顯示 2024 一月單月的月曆。

這操作起來比先前用的 ncal 好多了,先前用 cal 2024 會出現錯誤,因為只有一個參數時他會當作月份,而兩個參數時要把月份放前面,也就是用 cal 1 2024 才能正確顯示。

所以就把本來的 ncal 移除掉,改用 alias 來處理:「Add alias "cal".」。

其他的大多都有習慣的工具了,像是 base64 可以用 openssl base64 處理;而 json.tool 有 jq 可以用。

Windows 3.1 下的 GPT client

前幾天在 Hacker News 上看到「Show HN: WinGPT – AI assistant for Windows 3.1 (dialup.net)」這篇,原始文章「WinGPT: AI Assistant for Windows 3.1」在介紹 Windows 3.1 下的 GPT client。

雖然反差很大 (一個 20 世紀的 GUI 環境配上最新科技),但從 Hacker News 的討論可以看到,最熱烈的是在 16-bit 環境下實作 TLS 1.3 連線,也就原文裡的這段,提到了他是原生支援 TLS 1.3:

WinGPT connects to the OpenAI API server natively with TLS 1.3, so it doesn't require a proxy on a modern machine to terminate TLS. To see how I did this and some of the challenges, take a look at Modern TLS on 16-bit Windows. (As you'll see on that page, this is not a secure implementation).

不過他不是從零開始解,而是基於 wolfSSL 的實作,因為 wolfSSL 有支援 16-bit compiler,就不需要從零開始:

WolfSSL stood out among the pack as it had explicit 16-bit compiler support while being fully-featured and well-supported.

這是讓人看到會「蛤?」的東西 XD

macOS 要提供 DirectX 介面了

Hacker News 上看到 macOS 要提供 DirectX 介面了:「DirectX 12 Support on macOS (twitter.com/andytizer)」,原推是:

算是降低遊戲引擎維護的成本?讓開發商更有意願實作?不確定會有什麼效果...

Georgi Gerganov 成立公司 GGML

Hacker News 首頁上看到 Georgi Gerganov 成立公司的計畫:「GGML – AI at the Edge (ggml.ai)」,官網在「GGML - AI at the edge」。

如同 Georgi Gerganov 提到的,llama.cpp 這些專案本來是他的 side project,結果意外的紅起來:

另外他提到了 Nat FriedmanDaniel Gross 也幫了一把:

在官網則是有提到是 pre-seed funding:

ggml.ai is a company founded by Georgi Gerganov to support the development of ggml. Nat Friedman and Daniel Gross provided the pre-seed funding.

現在回頭來看,當初 llama.cpp 會紅起來主要是因為 CPU 可以跑 LLaMA 7B,而且用 CPU 跑起來其實也不算慢。

後來吸引了很多人一起幫忙,於是有了不少 optimization (像是「llama.cpp 的載入速度加速」這邊用 mmap 減少需要載入的時間,並且讓多個 process 之間可以重複使用 cache),接下來又有 GPU 的支援...

但不確定他開公司後,長遠的計畫是什麼...?

把 RabbitMQ 換成 PostgreSQL 的那篇文章...

Hacker News 上看到「SQL Maxis: Why We Ditched RabbitMQ and Replaced It with a Postgres Queue (prequel.co)」這篇文章,原文在「SQL Maxis: Why We Ditched RabbitMQ And Replaced It With A Postgres Queue」這邊,裡面在講他們把 RabbitMQ 換成 PostgreSQL 的前因後果。

文章裡面可以吐嘈的點其實蠻多的,而且在 Hacker News 上也有被點出來,像是有人就有提到他們遇到了 bug (或是 feature) 卻不解決 bug,而是決定直接改寫成用 PostgreSQL 來解決,其實很怪:

In summary -- their RabbitMQ consumer library and config is broken in that their consumers are fetching additional messages when they shouldn't. I've never seen this in years of dealing with RabbitMQ. This caused a cascading failure in that consumers were unable to grab messages, rightfully, when only one of the messages was manually ack'ed. Fixing this one fetch issue with their consumer would have fixed the entire problem. Switching to pg probably caused them to rewrite their message fetching code, which probably fixed the underlying issue.

另外一個吐嘈的點是量的部份,如果就這樣的量,用 PostgreSQL 降低使用的 tech stack 應該是個不錯的決定 (但另外一個問題就是,當初為什麼要導入 RabbitMQ...):

>To make all of this run smoothly, we enqueue and dequeue thousands of jobs every day.

If you your needs aren't that expensive, and you don't anticipate growing a ton, then it's probably a smart technical decision to minimize your operational stack. Assuming 10k/jobs a day, thats roughly 7 jobs per minute. Even the most unoptimized database should be able to handle this.

在同一個 thread 下面也有人提到這個量真的很小,甚至直接不講武德提到可以用 Jenkins 解 XD:

Years of being bullshitted have taught me to instantly distrust anyone who is telling me about how many things they do per day. Jobs or customers per day is something to tell you banker, or investors. For tech people it’s per second, per minute, maybe per hour, or self aggrandizement.

A million requests a day sounds really impressive, but it’s 12req/s which is not a lot. I had a project that needed 100 req/s ages ago. That was considered a reasonably complex problem but not world class, and only because C10k was an open problem. Now you could do that with a single 8xlarge. You don’t even need a cluster.

10k tasks a day is 7 per minute. You could do that with Jenkins.

然後意外看到 Simon Willison 提到了一個重點,就是 RabbitMQ 到現在還是不支援 ACID 等級的 job queuing (尤其是 Durability 的部份),也就是希望 MQ 系統回報成功收到的 task 一定會被處理:

The best thing about using PostgreSQL for a queue is that you can benefit from transactions: only queue a job if the related data is 100% guaranteed to have been written to the database, in such a way that it's not possible for the queue entry not to be written.

Brandur wrote a great piece about a related pattern here: https://brandur.org/job-drain

He recommends using a transactional "staging" queue in your database which is then written out to your actual queue by a separate process.

這也是當年為什麼用 MySQL 幹類似的事情,要 ACID 的特性來確保內容不會掉。

這也是目前我覺得唯一還需要用 RDBMS 當 queue backend 的地方,但原文公司的想法就很迷,遇到 library bug 後決定換架構,而不是想辦法解 bug,還很開心的寫一篇文章來宣傳...

npm 裡的 redis 與 ioredis

前幾天在噗浪的偷偷說上看到有人提到 npmtrends 上的 redis (官方的) 與 ioredis:「https://www.plurk.com/p/p6wdc9」。

意外發現以下載量來看,ioredis 已經超越官方的 redis 了:

找了一下差異,看起來的確有些團隊在 loading 很高的情況下會考慮用 ioredis 取代 redis:「Migrating from Node Redis to Ioredis: a slightly bumpy but faster road」。

但沒有特別需求的話應該還是會用官方版本?

AWS 官方推出了自己的 Amazon S3 FUSE 套件

看到「Mountpoint for Amazon S3」這個專案,AWS 自己推出了自己的 Amazon S3 FUSE 套件。Hacker News 上也有一些討論:「Mountpoint – file client for S3 written in Rust, from AWS (github.com/awslabs)」。

Amazon S3 的價錢比其他 AWS 提供的 storage 都便宜不少。以美東第一區 us-east-1 來說,S3 是 $0.023/GB,而 EBS (gp3) 要 $0.08/GB,即使是 EBS (st1) 也要 $0.045/GB。

S3 相較於 EBS 來說,多了 API call 的費用,所以對於不會產生大量 API call 的應用來說 (像是常常會寫很大包的資料到檔案裡),透過 FUSE 操作 Amazon S3 可以讓現有的套裝軟體或是程式直接跑上去。

另外一個常見的應用是讓套裝軟體或是現成的程式可以讀取 S3 的資料。

之前這類應用馬上會想到的專案是 s3fs-fuse,這個專案很久了,大家也都知道多人寫入的部份會是痛點。

這次 AWS 自己出來做的事情有點重工,看起來他想做的事情 s3fs-fuse 都解的差不多了,目前看起來唯一的賣點應該只有 Rust-based,但 s3fs-fuse 主要是 C++,其實也沒差到哪裡:

Mountpoint for Amazon S3 is optimized for read-heavy workloads that need high throughput. It intentionally does not implement the full POSIX specification for file systems.

目前專案還是 alpha release,不確定專案的方向到底是什麼...

華盛頓郵報怎麼把 Mapbox 換成其他 open source 方案

Hacker News 上面看到「How The Post is replacing Mapbox with open source solutions」這篇,講華盛頓郵報怎麼把 Mapbox 換成 open source 方案,對應的討論在「Replacing Mapbox with open source solutions (kschaul.com)」這邊。

維基百科有提到大概兩年前,2020 年底的時候 Mapbox GL JS 從開源授權換成私有授權了 (也可以參考先前寫的「Mapbox GL JS 的授權改變,以及 MapLibre GL 的誕生」這篇):

In December 2020, Mapbox released the second version of their JavaScript library for online display of maps, Mapbox GL JS. Previously open source code under a BSD license, the new version switched to proprietary licensing. This resulted in a fork of the open source code, MapLibre GL, and initiation of the MapLibre project.

裡面題到了四個工具。

第一個是 OpenMapTiles,下載部份的圖資使用,對於報導只需要某個區域很方便:

OpenMapTiles is an extensible and open tile schema based on the OpenStreetMap. This project is used to generate vector tiles for online zoomable maps. OpenMapTiles is about creating a beautiful basemaps with general layers containing topographic information.

第二個是 Maputnik,可以修改圖資呈現的方式:

A free and open visual editor for the Mapbox GL styles targeted at developers and map designers.

第三個是 PMTiles,可以將圖資檔案塞到一個大檔案裡面,然後透過 HTTP range requests 下載需要的部份就好,大幅下降 HTTP request 所需要的費用 (很多 CDN 會依照 HTTP request 數量收費):

Protomaps is a serverless system for planet-scale maps.

An alternative to map APIs at 1% the cost, via single static files on your own cloud storage. Deploy datasets like OpenStreetMap for your site in minutes.

最後就是 fork 出來開源版本的 maplibre-gl-js

MapLibre GL JS is an open-source library for publishing maps on your websites or webview based apps. Fast displaying of maps is possible thanks to GPU-accelerated vector tile rendering.

It originated as an open-source fork of mapbox-gl-js, before their switch to a non-OSS license in December 2020. The library's initial versions (1.x) were intended to be a drop-in replacement for the Mapbox’s OSS version (1.x) with additional functionality, but have evolved a lot since then.

這樣看起來好像可以用在像 KKTIX 這種下面顯示固定地圖的地方:

Pony ORM

Simon Willison 的 blog 上看到的東西:「Python’s “Disappointing” Superpowers」,裡面提到的原文是「Python’s “Disappointing” Superpowers」這篇,在講 Python 的工具。

雖然是說「disappointing」,但實際上是反義,在原文裡面提到了很多特別的工具,其中 Pony ORM 算是我覺得最有趣的了,他的寫法就非常的 Python:

select(c for c in Customer if sum(c.orders.price) > 1000)

也可以用 lambda 的形式來寫:

Customer.select(lambda c: sum(c.orders.total_price) > 1000)

這樣會產生出對應的 SQL:

SELECT "c"."id"
FROM "customer" "c"
  LEFT JOIN "order" "order-1"
    ON "c"."id" = "order-1"."customer"
GROUP BY "c"."id"
HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000

不會產生 syntax error 的原因是因為他直接解讀 bytecode 分析,產生出對應的 SQL query:

A normal understanding of generator expressions suggests that the select function is consuming a generator. But that couldn’t explain the behaviour here. Instead, it actually introspects the frame object of the calling code, then decompiles the byte code of the generator expression object it finds, and builds a Query based on the AST objects.

用這樣的設計來達到語法的自由度。

看了一下也有一些 integration,像是 Flask 的「Integration with flask」與 FastAPI 的「Integration with FastAPI」。

不過應該是先看看,目前 Python 上用的主力還是 Django,有自己的 ORM 架構...

jQuery 3.6.2

看到 jQuery 出新版的消息:「jQuery 3.6.2 Released!」。

這個版本的說明裡面針對了 :has() 這個功能著墨,主要還是因為這功能 jQuery 支援很久了,而 ChromiumSafai 在最近都開始支援了:「Issue 669058: CSS selectors Level 4: support :has()」、「Using :has() as a CSS Parent Selector and much more」。

不過看起來 jQuery 要整合原生的 :has() 還是有技術困難,因為裡面可能會包括 jQuery 自己定義的 selector extension:

That was problematic in cases where :has() contained another jQuery selector extension (e.g. :has(:contains("Item"))) or contained itself (:has(div:has(a))).

總之,難得又看到 jQuery 的消息... 前一版的 3.6.1 是 2022/08/26,而 3.6.0 則是再一年多前的 2021/03/02,再往前面是 3.5.1 的 2020/05/04 與 3.5.0 的 2020/04/10。

是個歷史...