Python 3.12 將淘汰 datetime.datetime 的 utcnow() 與 utcfromtimestamp()

Simon Willison 這邊看到「It's Time For A Change: datetime.utcnow() Is Now Deprecated」,引用的文章是「It's Time For A Change: datetime.utcnow() Is Now Deprecated」這篇。

文章裡面有提到歷史因素,以及這樣設計造成的問題。

在文章後面有提到替代方案,改了一下裡面的用法,等價於這個:

from datetime import datetime, timezone
datetime.now(timezone.utc)
datetime.fromtimestamp(timestamp, timezone.utc)

或是這樣:

import datetime
datetime.datetime.now(datetime.timezone.utc)
datetime.datetime.fromtimestamp(timestamp, datetime.timezone.utc)

要稍微注意一下這個歷史遺跡要被拆了... (StackOverflow 上面應該有很多用到這兩個 function 的解答)

AWS 東京區的機器有 μs 等級的 Amazon Time Sync Service 可以用

AWS 宣佈 μs 等級的 Amazon Time Sync Service,也就是比 ms 再多三個零的等級:「Amazon Time Sync Service now supports microsecond-accurate time」。

一般的 Time Sync Service 服務大多是 ms 等級,也就是 10-3 秒這個等級,μs 則是到了 10-6...

但這其實有點詭異,目前只有 AWS 東京區有這個服務,而且限制在 r7g 的機器上才能用:

Amazon Time Sync with microsecond-accurate time is available starting today in the Asia Pacific (Tokyo) Region on all R7g instances, and we will be expanding support to additional AWS Regions and EC2 Instance types.

也就是說 us-east-1us-west-2 這些第一線熱門的區域沒有,另外只有特別的 ARM 機種,而且是 memory-intensive 的 r7g 才能用?

先放著看看 XDDD

Cloudflare Workers 推出新的計費模式:以 CPU time 收費

CloudflareWorkers 這個產品 (serverless 產品線) 推出了以 CPU time 收費的模式:「New Workers pricing — never pay to wait on I/O again」。

在這之前大家都是以 wall time 在計費,但這對於會卡在 I/O 很久的應用來說很不利,這次 Cloudflare 提出方案改用 CPU time 來計費,的確有吸引到我的目光...

這是舊的:

這是新的:

就這兩張比較起來有個不是很確定的部分,現在不看 memory 用量收費了?Pricing 這頁裡面的資量已經把 「Standard」更新上去了,但好像還是沒提到 memory?

不過不是馬上生效,而是這個月的月底 2023/10/31 可以選擇切到新的方案:

Starting October 31, 2023, you will have the option to opt in individual Workers and Pages Functions projects on your account to new pricing, and newly created projects will default to new pricing. You’ll be able to estimate how much new pricing will cost in the Cloudflare dashboard. For the majority of current applications, new pricing is the same or less expensive than the previous Bundled and Unbound pricing plans.

另外明年 2024/03/01 會全部強制切到新的方案:

If you’re on our Workers Paid plan, you will have until March 1, 2024 to switch to the new pricing on your own, after which all of your projects will be automatically migrated to new pricing. If you’re an Enterprise customer, any contract renewals after March 1, 2024, will use the new pricing.

用 CPU time 的確是好不少,但不知道這個 billing 的方式沒有其他地雷...

改善 Wikipedia 的 JavaScript,減少 300ms 的 blocking time

Hacker News 首頁上看到「300ms Faster: Reducing Wikipedia's Total Blocking Time」這篇,作者 Nicholas RayWikimedia Foundation 的工程師,雖然是貼在自己的 blog 上,但算是半官方性質了... 文章裡面提到了兩個改善都是跟前端 JavaScript 有關的。

作者是透過瀏覽器端的 profiling 產生火焰圖,判讀裡面哪塊是大塊的問題,然後看看有沒有機會改善。

先看最後的成果,可以看到第一個 fix 讓 blocking time 少了 200ms 左右,第二個 fix 則是少了 80ms 左右:

第一個改善是從火焰圖發現 l._enable 吃掉很多 blocking time:

作者發現是因為 find() 找出所有的連結後 (a 元素),跑去每一個連結上面綁定事件造成的效能問題:

The .on("click") call attached a click event listener to nearly every link in the content so that the corresponding section would open if the clicked link contained a hash fragment. For short articles with few links, the performance impact was negligible. But long articles like ”United States” included over 4,000 links, leading to over 200ms of execution time on low-end devices.

但這其實是 redundant code,在其他地方已經有處理了,所以解法就比較簡單,拔掉後直接少了 200ms:

Worse yet, this behavior was unnecessary. The downstream code that listened to the hashchange event already called the same method that the click event listener called. Unless the window’s location already pointed at the link’s destination, clicking a link called the checkHash method twice — once for the link click event handler and once more for the hashchange handler.

第二個改善是 initMediaViewer 吃掉的 blocking time,從 code 也可以看到問題也類似,跑一個 loop 把事件掛到所有符合條件的元素上面:

這邊的解法是 event delegation,把事件掛到上層的元素,就只需要掛一個,然後多加上檢查事件觸發的起點是不是符合條件就可以了,這樣可以大幅降低「掛」事件的成本。

這點算是常用技巧,像是 table 裡面有事件要掛到很多個 td 的時候,會改成把一個事件掛到 table 上面,另外加上判斷條件。

算是蠻標準的 profiling 過程,直接拉出真實數據來看,然後調有重大影響的部分。

llama.cpp 的載入速度加速

Hacker News 上看到「Llama.cpp 30B runs with only 6GB of RAM now (github.com/ggerganov)」這個消息,原 pull request 在「Make loading weights 10-100x faster #613」這邊。

這個 PR 的作者 Justine Tunney 在 PR 上有提到他改變 model 檔案格式,以便改用 mmap(),大幅降低了需要預先讀取的時間 (因為變成 lazy-loading style),而且這也讓系統可以利用 cache page,避免了 double buffering 的問題:

This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache pages directly accessible to our inference processes; and secondly, that the file cache pages are much less likely to get evicted (which would force loads to hit disk) because they're no longer competing with memory pages that were needlessly created by gigabytes of standard i/o.

這讓我想到在資料庫領域中,PostgreSQL 也會用 mmap() 操作,有點類似的概念。

另外 Justine Tunney 在這邊的 comment 有提到一個意外觀察到的現象,他發現實際在計算的時候用到的 model 內容意外的少:他用一個簡單的 prompt 測試,發現 20GB 的 30B model 檔案在他的 Intel 機器上實際只用到了 1.6GB 左右:

If I run 30B on my Intel machine:

[...]

As we can see, 400k page faults happen, which means only 1.6 gigabytes ((411522 * 4096) / (1024 * 1024)) of the 20 gigabyte weights file actually needed to be used.

這點他還在懷疑是不是他的修改有 bug,但目前他覺得不太像,也看不太出來:

Now, since my change is so new, it's possible my theory is wrong and this is just a bug. I don't actually understand the inner workings of LLaMA 30B well enough to know why it's sparse. Maybe we made some kind of rare mistake where llama.cpp is somehow evaluating 30B as though it were the 7B model. Anything's possible, however I don't think it's likely. I was pretty careful in writing this change, to compare the deterministic output of the LLaMA model, before and after the Git commit occurred. I haven't however actually found the time to reconcile the output of LLaMA C++ with something like PyTorch. It'd be great if someone could help with that, and possibly help us know why, from more a data science (rather than systems engineering perspective) why 30B is sparse.

如果不是 bug 的話,這其實冒出了一個很有趣的訊號,表示這些 model 是有可能再瘦身的?

Heroku 公佈了廢止免費方案的時間表

打開 Hacker News 看到的第一名,Heroku 公佈了廢止免費方案的時間表:「Removal of Heroku free product plans (heroku.com)」,文章在「Removal of Heroku Free Product Plans FAQ」。

沒在用的帳號會在 2022/10/26 開始刪,既有的帳號會在 2022/11/28 終止:

Focus on what's mission-critical: Removal of free dynos, hobby-dev Heroku Postgres and hobby-dev Heroku Data for Redis plans starting November 28, 2022 and inactive account deletion starting October 26, 2022.

取而代之的是針對特定團體條件性的開放,分成三類:學生、非營利組織以及 open source 專案。但前兩個目前方案都還沒出來,要晚點才會公佈;後面的 open source 專案則是要寄信申請。

不過現在好像沒什麼人在用 Heroku 了,大多都是因為以前有在用的人就繼續用,如果要講 "sexy" 的產品 (玩新東西的感覺),Fly.io 應該是比較常見的方案?

原來有專有名詞:TOCTOU (Time-of-check to time-of-use)

看「The trouble with symbolic links」這篇的時候看到的專有名詞:「TOCTOU (Time-of-check to time-of-use)」,直翻是「先檢查再使用」,算是一個常見的 security (hole) pattern,因為檢查完後有可能被其他人改變,接著使用的時候就有可能產生安全漏洞。

在資料庫這類環境下,有 isolation (ACID 裡的 I) 可以確保不會發生這類問題 (需要 REPEATABLE-READ 或是更高的 isolation level)。

但在檔案系統裡面看起來不太順利,2004 年的時候研究出來沒有 portable 的方式可以確保避免 TOCTOU 的問題發生:

In the context of file system TOCTOU race conditions, the fundamental challenge is ensuring that the file system cannot be changed between two system calls. In 2004, an impossibility result was published, showing that there was no portable, deterministic technique for avoiding TOCTOU race conditions.

其中一種 mitigation 是針對 fd 監控:

Since this impossibility result, libraries for tracking file descriptors and ensuring correctness have been proposed by researchers.

然後另外一種方式 (比較治本) 是檔案系統的 API 支援 transaction,但看起來不被主流接受?

An alternative solution proposed in the research community is for UNIX systems to adopt transactions in the file system or the OS kernel. Transactions provide a concurrency control abstraction for the OS, and can be used to prevent TOCTOU races. While no production UNIX kernel has yet adopted transactions, proof-of-concept research prototypes have been developed for Linux, including the Valor file system and the TxOS kernel. Microsoft Windows has added transactions to its NTFS file system, but Microsoft discourages their use, and has indicated that they may be removed in a future version of Windows.

目前看起來的問題是沒有一個讓 Linux community 能接受的 API 設計?

要求參數名稱要加上單位的請願

Hacker News 上看到「Please put units in names (ruudvanasseldonk.com)」這邊的討論,看到的時候已經超過 900 點了 (好像還在急遽成長),算是爆紅狀態,可以感覺到超多 developer 們的怒氣 XDDD

原文在「Please put units in names」這篇,在講參數的命名要加上單位,尤其是時間相關的參數。

時間相關的參數根本沒有共識,你不查資料不會知道是 ns、ms 還是 seconds,所以就會出現象在這三個程式語言跑出來的時間是不一樣的:

然後你就翻桌了 XDDD

Amazon EFS 的效能提昇

AWS 宣佈他們將 Amazon EFS 的 latency 大幅降低以提昇效能:「Amazon Elastic File System Update – Sub-Millisecond Read Latency」。

Linux 上一般是用 NFS 掛 EFS,個位數的 ms 的確對於效能影響超大,現在宣稱讀取的部份降到 0.6ms,應該會有蠻明顯的感覺:

Up until today, EFS latency for read operations (both data and metadata) was typically in the low single-digit milliseconds. Effective today, new and existing EFS file systems now provide average latency as low as 600 microseconds for the majority of read operations on data and metadata.

然後不另外收費:

This performance boost applies to One Zone and Standard General Purpose EFS file systems. New or old, you will still get the same availability, durability, scalability, and strong read-after-write consistency that you have come to expect from EFS, at no additional cost and with no configuration changes.

另外就是過去幾個禮拜他們把現有的 EFS 都轉移過去了:

We “flipped the switch” and enabled this performance boost for all existing EFS General Purpose mode file systems over the course of the last few weeks, so you may already have noticed the improvement. Of course, any new file systems that you create will also benefit.

不過 EFS 另外一個問題就是貴炸,用錢換方便...

QTE 小遊戲 Looptap

Hacker News 首頁上看到這個小遊戲:「Show HN: Looptap – A minimal game to waste your time (vasanthv.com)」,如同他的標題寫的,浪費時間的小遊戲 XDDD

標題講的 QTE 是「快速反應事件」這個東西,在現在的遊戲裡面算是蠻常見的機制,是一種需要在事件有效區間反應的設計 (太早反應或是太晚反應都不行)。