AWS 推出了 Graviton3 的機種

Amazon EC2 推出了 Graviton3 的機種:「New Graviton3-Based General Purpose (m7g) and Memory-Optimized (r7g) Amazon EC2 Instances」。

第一波只有一般的 m7g 與記憶體型的 r7g,而計算型的 c7g 大家在 Twitter 上猜應該晚點會放出消息。在去年五月就推出了:「AWS 推出 c7g 機種」。

目前只在歐美的 us-east-1us-east-2us-west-2eu-west-1 區提供,亞洲目前都還沒有這些機種可以用:

M7g and R7g instances are available today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) AWS Regions in On-Demand, Spot, Reserved Instance, and Savings Plan form.

官方宣稱比 Graviton2 的 m6g & r6g 多了 25% 的效能,不過我另外查了一下 us-east-1 上的價錢,也貴了 6% 左右,如果依照官方宣稱的數字計算,大約是 18% 左右的 CP 值提昇,對於有實際上跑滿的 CPU 的人是個不錯的效能提昇:

Today I am happy to tell you about the newest Amazon EC2 instance types, the M7g and the R7g. Both types are powered by the latest generation AWS Graviton3 processors, and are designed to deliver up to 25% better performance than the equivalent sixth-generation (M6g and R6g) instances, making them the best performers in EC2.

裡面有提到在 Graviton3 的一個架構上的大改變是記憶體從 DDR4 變到 DDR5,這使得記憶體的傳輸頻寬提昇了 50%:

Both types of instances are equipped with DDR5 memory, which provides up to 50% higher memory bandwidth than the DDR4 memory used in previous generations.

接下來是看有沒有下放到 t 系列的計畫,像是 t5g 之類的,有的話再用看看好了,不過 blog 這台已經買了三年 RI,等到期間滿了之後說不定都有 Graviton4 或是 Graviton5 了...

Linode 改名叫 Akamai Connected Cloud

Linode 改名叫做 Akamai Connected Cloud:「A Bold New Approach to the Cloud」、「Akamai Unveils Akamai Connected Cloud and New Cloud Computing Services」。

所以 Akamai 走的路線是整合品牌,這樣其實有很大的機會是會讓 Linode 本來的速度被大組織架構拖慢。

啊,反正現在已經沒在用 Linode 了,只能跟 HN 裡「Linode rebranded as Akamai’s cloud computing services (linode.com)」討論一樣,RIP Linode...

Python 的 asyncio.create_task() 的設計地雷

今天的 Hacker News Daily 上看到「The Heisenbug lurking in your async code」這篇,HN 的討論則是在「A Heisenbug lurking in async Python (textualize.io)」這。

設計上面 asyncio.create_task() 傳回的物件只有被 weak reference 到,而不是一般的 reference,所以會導致 Python 在 GC 時就真的被收走了:

Important: Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done. For reliable “fire-and-forget” background tasks, gather them in a collection:

在前一段有提到可以用 asyncio.TaskGroup.create_task() 來做,這也是官方建議的解法,不過這個是 3.11 才新增的功能:

Note: asyncio.TaskGroup.create_task() is a newer alternative that allows for convenient waiting for a group of related tasks.

是個容易忘記然後中雷的東西,畢竟有個功能性接近的 threading,是可以抱持著 fire-and-forget 的心態在用,但這邊不是 threading XD

Kagi 開發的 Universal Summarizer

在「Universal Summarizer (kagi.com)」這邊看到的新服務,可以給出某個網址的 summary。服務的本體則是在「Kagi - Universal Summarizer」這邊,從網址可以猜測是 Kagi 的實驗項目。

像是 CNN 的「Biden’s dramatic warning to China」這篇,他抓出對應的 summary 看起來沒什麼問題:

President Joe Biden delivered a dramatic warning to China in his State of the Union address, vowing to protect America's sovereignty if China threatens it. He specifically named President Xi Jinping, saying "Name me a world leader who'd change places with Xi Jinping. Name me one!" This marked a stark escalation in the US-Chinese relationship, which has been strained by a balloon surveillance program and other issues. Biden also addressed Russia, calling their invasion of Ukraine a test for America and the world. His speech highlighted the unified opposition to China in US politics, with House Speaker Kevin McCarthy having convened a bipartisan House committee to examine the perceived threat from the Chinese Communist Party. Biden's comments also served as an important milestone in the increasingly tumultuous competition between the US and China, as the US shifts to talking about establishing guardrails for the relationship and protecting the Western-led rules-based international system.

這個功能可以用 GPT-3.5 或是 ChatGPT 串,但不確定 Kagi 是串上去還自己搞?我猜有蠻大的機會是串的...

然後他的網址設計因為是 url 傳遞的方式,可以包裝成 bookmarklet 形式放在快速列上面用:「Kagi Universal Summarizer」。

另外在 HN 的討論裡面看到很厲害的用法,把 https://bellard.org/quickjs/pi_bigdecimal.js 這段 javascript code 的 url 丟進去,然後居然出現了程式碼的說明,居然還正確判斷出是 Chudnovsky algorithm

This code uses the QuickJS bigdecimal type to calculate the value of pi to a given precision. It does this by using the Chudnovsky algorithm, which is a series of calculations that can be used to approximate pi. The code is written in Javascript and uses BigInt and BigDecimal to perform the calculations. It is interesting to note that the code also takes into account the possibility of bad rounding for the last digits, and adds extra digits to reduce the probability of this happening.

既然包了一個 bookmarklet,最近應該會常常拿出來用...

Twitter 砍免費 API 的時程延到 2/13

先前在「Twitter 宣佈要廢掉免費的 API 權限」這邊有提到在 2/9 要廢掉 API free tier 的事情延期了,延到 2/13:

據說跟老闆很在意 Super Bowl,然後丟了內部信件出來有關:

本來想說今天會爛掉,剛好可以來弄弄,看起來可以拖延到週末了...

Windows 11 瘦身版本的 Tiny11

Tiny11NTDEV 弄出來的精簡版 Windows 11:「De-Bloated Windows 11 Build Runs on 2GB of RAM」。HN 上對應的討論在「De-Bloated Windows 11 Build Runs on 2GB of RAM (tomshardware.com)」。

It just uses around 8GB of space compared to the 20+GB that a standard installation does.

但有些限制,像是安全性更新需要自己來:

This OS install “is not serviceable,” notes NTDev. “.NET, drivers and security definition updates can still be installed from Windows Update,” so this isn’t an install which you can set and forget.

另外像是透過 WinSxS 安裝的功能 (包括語言) 會無法安裝:

Moreover, removing the Windows Component Store (WinSxS), which is responsible for a fair degree of Tiny11’s compactness, means that installing new features or languages isn’t possible.

但我記得拔掉 WinSxS 應該會影響蠻多東西的?這樣的系統應該是拿來跑跑 CI 或是固定用途還行,一般性的用途不知道會卡多少東西...

另外除了使用的磁碟空間變小以外,記憶體的使用量也大幅下降,畢竟也拔掉了一堆肥大的軟體:

In testing, NTDev said that Tiny11 could “run great” on a system with just 2GB of RAM.

關於 twemoji.maxcdn.com 這個網址的一些事情

在「phpBB 3.3.10 Release」這邊看到這個修正,研究了一下發現原來有些故事在後面跑:

Update the emoji CDN: PHPBB3-17071

Twitter Emoji (Twemoji) 這個計畫是 Twitter 弄出來的 open source project,最主要是讓不支援新版 Unicode 的系統上可以改用圖片顯示出來 (畢竟 Unicode 一直在加字)。

其中提供的 url 是 https://twemoji.maxcdn.com/v/latest/twemoji.min.js,可以看到裡面帶有 MaxCDN 的資訊,算是一種廣告,但後來 StackPath 在 2016 併購 MaxCDN,接下來是在去年打算要淘汰掉 MaxCDN 這個產品線:「MaxCDN and SecureCDN are Retiring; Here’s What It Means for You」。

這件事情被帶到「Clarify MaxCDN URLs now that MaxCDN is retiring #556」這邊討論,但看起來沒有太多動作,後來在「Maxcdn has shut down, cdn not working anymore. #580」這邊又被帶起來討論,其中 Twitter 前員工大概提了一下情況,主要是當年他們跟 MaxCDN 有談過讓 MaxCDN 負責頻寬的部份:

@simplexx among all the things, twemoji has always been a Twitter service for the community.

At my times in there, we had a great agreement / deal with MaxCDN so that it's hard to blame the boss this time, as MaxCDN is a completely different company/story.

What I see is some poor attention to this project, as companies don't close from a day to another (usually?) but as we all know what's going on @ twitter, I can't really blame any of my former colleagues, or new arrivals there.

Please let's not make it a wall of shame for all the people that worked on this, thanks for your understanding (I've left 7 years ago or more, as example, I've got pinged by some follower and I'm just trying to help you out anyway).

另外也有人提到目前 Twitter 的溝通管道的狀況:

For what it's worth, while I was still there we were in talks with MaxCDN to have the same deal when they migrated to be Stackpath. Everyone who had worked on the deal with MaxCDN had left and left no record of that deal, so it was taking them a bit longer to work out than expected – MaxCDN used to offer free hosting to OSS projects, but Stackpath wouldn't be doing that. They were working on an exception for us, but any emails they send to our Twitter emails now get bounced, so I guess they could've gotten this sorted out before shutting down MaxCDN... but it's not like we're there on the other end to make it happen, so it appears we'll never know.

Anyway,現在是至少讓 twemoji.maxcdn.com 恢復到「會動」了,但情況變得很特殊,twemoji.maxcdn.com 這個網址目前是指到競業的 BunnyCDN 上:

;; ANSWER SECTION:
twemoji.maxcdn.com.     229     IN      CNAME   twemoji.b-cdn.net.
twemoji.b-cdn.net.      35      IN      A       23.248.177.58

但如果你真的打過去要檔案,會是 301 到 jsDelivr 上:

$ http https://twemoji.maxcdn.com/2/svg/1f525.svg
[...]
Location: https://cdn.jsdelivr.net/npm/twemoji@11.3.0/2/svg/1f525.svg
[...]

而 jsDelivr 目前是放在 Fastly 上:

;; ANSWER SECTION:
cdn.jsdelivr.net.       180     IN      CNAME   jsdelivr.map.fastly.net.
jsdelivr.map.fastly.net. 30     IN      A       199.232.45.229

但在 GitHub 的說明上面則是建議用 UNPKG

<script src="https://unpkg.com/twemoji@latest/dist/twemoji.min.js" crossorigin="anonymous"></script>

而 UNPKG 目前的 CDN 的部份則是 Cloudflare

;; ANSWER SECTION:
unpkg.com.              86400   IN      NS      anirban.ns.cloudflare.com.
unpkg.com.              86400   IN      NS      aron.ns.cloudflare.com.
;; ANSWER SECTION:
unpkg.com.              300     IN      A       104.16.126.175
unpkg.com.              300     IN      A       104.16.123.175
unpkg.com.              300     IN      A       104.16.122.175
unpkg.com.              300     IN      A       104.16.124.175
unpkg.com.              300     IN      A       104.16.125.175

指到 BunnyCDN 但是 BunnyCDN 只負責 redirect,然後也不是導到 UNPKG 上...

Pony ORM

Simon Willison 的 blog 上看到的東西:「Python’s “Disappointing” Superpowers」,裡面提到的原文是「Python’s “Disappointing” Superpowers」這篇,在講 Python 的工具。

雖然是說「disappointing」,但實際上是反義,在原文裡面提到了很多特別的工具,其中 Pony ORM 算是我覺得最有趣的了,他的寫法就非常的 Python:

select(c for c in Customer if sum(c.orders.price) > 1000)

也可以用 lambda 的形式來寫:

Customer.select(lambda c: sum(c.orders.total_price) > 1000)

這樣會產生出對應的 SQL:

SELECT "c"."id"
FROM "customer" "c"
  LEFT JOIN "order" "order-1"
    ON "c"."id" = "order-1"."customer"
GROUP BY "c"."id"
HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000

不會產生 syntax error 的原因是因為他直接解讀 bytecode 分析,產生出對應的 SQL query:

A normal understanding of generator expressions suggests that the select function is consuming a generator. But that couldn’t explain the behaviour here. Instead, it actually introspects the frame object of the calling code, then decompiles the byte code of the generator expression object it finds, and builds a Query based on the AST objects.

用這樣的設計來達到語法的自由度。

看了一下也有一些 integration,像是 Flask 的「Integration with flask」與 FastAPI 的「Integration with FastAPI」。

不過應該是先看看,目前 Python 上用的主力還是 Django,有自己的 ORM 架構...

長距離的 WiFi 科技 (背景:烏克蘭網路)

Hacker News 上的「Ask HN: What's with the DIY state of the art long-range Wi-Fi?」這篇開頭,嘗試找出低成本且 scalable 的方式重建戰爭時期烏克蘭的網路。

發文的人想要透過一般民用的 WiFi 設備達到長距離 (這邊指的是遠超過 WiFi 設計時的距離,5~20km),但不用很快的網路 (大概 0.1~1Mbps):

The idea is to make a single device that can act as both a relay station, as well as an actual hotspot, it then can be placed in line-of-sight configuration to potentially cover huge areas (the accepted performance would be anywhere from 0.1-1 Mbit/s point-to-point over anywhere from 5-20km.

然後成本要夠低,希望壓在每組 US$100 的範圍:

We are aiming to bring the cost of such configuration down to $100 per unit at least.

後面的討論裡面有提到幾個社群有在建立分散式網路,一個比較知名的是德國的 Freifunk,從「Find your nearest community」這邊用節點數量排序可以看到最大的社群超過 4000 個節點,另外有兩個主流的技術處理 routing 的問題,一個是 batman-adv,另外一個是 802.11s

另外找資料的時候有發現維基百科也有頁面在介紹這個技術:「Long-range Wi-Fi」,另外這個主題也是國外一些科技 YouTuber 喜歡拍的,像是 Linus Tech Tips 之前用 UI 的設備在視線可直視的 12km 距離下跑過 100+Mbps 的速度:

這樣似乎是可以預期一般的硬體刷機加上指向性天線,是有機會達到前面提到的要求...

用 AI 模型判斷是否為 AI 產生的文字

OpenAI 放出了新的 model,可以用來判斷是否為 AI 產生的文字:「New AI classifier for indicating AI-written text」。

但目前的成效其實還是不太行,只以英文的成效來看,true positive 只有 26%,而 false positive 是 9%:

In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives).

另外也有提到弱點,像是比較短的內容機很難辨認:

The classifier is very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.

然後就是有正確答案的內容也很難辨認,因為正確答案幾乎都是一樣的:

Text that is very predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or humans, because the correct answer is always the same.

另外題到了技術上的限制,現在的方法比較像是「辨認是不是從某些 corpus 訓練出來的 model,所產生的文字」,而非通用性的 AI 文字偵測:

Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction.

看起來是還不到可以用的程度...