華盛頓郵報怎麼把 Mapbox 換成其他 open source 方案

Hacker News 上面看到「How The Post is replacing Mapbox with open source solutions」這篇,講華盛頓郵報怎麼把 Mapbox 換成 open source 方案,對應的討論在「Replacing Mapbox with open source solutions (kschaul.com)」這邊。

維基百科有提到大概兩年前,2020 年底的時候 Mapbox GL JS 從開源授權換成私有授權了 (也可以參考先前寫的「Mapbox GL JS 的授權改變,以及 MapLibre GL 的誕生」這篇):

In December 2020, Mapbox released the second version of their JavaScript library for online display of maps, Mapbox GL JS. Previously open source code under a BSD license, the new version switched to proprietary licensing. This resulted in a fork of the open source code, MapLibre GL, and initiation of the MapLibre project.

裡面題到了四個工具。

第一個是 OpenMapTiles,下載部份的圖資使用,對於報導只需要某個區域很方便:

OpenMapTiles is an extensible and open tile schema based on the OpenStreetMap. This project is used to generate vector tiles for online zoomable maps. OpenMapTiles is about creating a beautiful basemaps with general layers containing topographic information.

第二個是 Maputnik,可以修改圖資呈現的方式:

A free and open visual editor for the Mapbox GL styles targeted at developers and map designers.

第三個是 PMTiles,可以將圖資檔案塞到一個大檔案裡面,然後透過 HTTP range requests 下載需要的部份就好,大幅下降 HTTP request 所需要的費用 (很多 CDN 會依照 HTTP request 數量收費):

Protomaps is a serverless system for planet-scale maps.

An alternative to map APIs at 1% the cost, via single static files on your own cloud storage. Deploy datasets like OpenStreetMap for your site in minutes.

最後就是 fork 出來開源版本的 maplibre-gl-js

MapLibre GL JS is an open-source library for publishing maps on your websites or webview based apps. Fast displaying of maps is possible thanks to GPU-accelerated vector tile rendering.

It originated as an open-source fork of mapbox-gl-js, before their switch to a non-OSS license in December 2020. The library's initial versions (1.x) were intended to be a drop-in replacement for the Mapbox’s OSS version (1.x) with additional functionality, but have evolved a lot since then.

這樣看起來好像可以用在像 KKTIX 這種下面顯示固定地圖的地方:

關於 twemoji.maxcdn.com 這個網址的一些事情

在「phpBB 3.3.10 Release」這邊看到這個修正,研究了一下發現原來有些故事在後面跑:

Update the emoji CDN: PHPBB3-17071

Twitter Emoji (Twemoji) 這個計畫是 Twitter 弄出來的 open source project,最主要是讓不支援新版 Unicode 的系統上可以改用圖片顯示出來 (畢竟 Unicode 一直在加字)。

其中提供的 url 是 https://twemoji.maxcdn.com/v/latest/twemoji.min.js,可以看到裡面帶有 MaxCDN 的資訊,算是一種廣告,但後來 StackPath 在 2016 併購 MaxCDN,接下來是在去年打算要淘汰掉 MaxCDN 這個產品線:「MaxCDN and SecureCDN are Retiring; Here’s What It Means for You」。

這件事情被帶到「Clarify MaxCDN URLs now that MaxCDN is retiring #556」這邊討論,但看起來沒有太多動作,後來在「Maxcdn has shut down, cdn not working anymore. #580」這邊又被帶起來討論,其中 Twitter 前員工大概提了一下情況,主要是當年他們跟 MaxCDN 有談過讓 MaxCDN 負責頻寬的部份:

@simplexx among all the things, twemoji has always been a Twitter service for the community.

At my times in there, we had a great agreement / deal with MaxCDN so that it's hard to blame the boss this time, as MaxCDN is a completely different company/story.

What I see is some poor attention to this project, as companies don't close from a day to another (usually?) but as we all know what's going on @ twitter, I can't really blame any of my former colleagues, or new arrivals there.

Please let's not make it a wall of shame for all the people that worked on this, thanks for your understanding (I've left 7 years ago or more, as example, I've got pinged by some follower and I'm just trying to help you out anyway).

另外也有人提到目前 Twitter 的溝通管道的狀況:

For what it's worth, while I was still there we were in talks with MaxCDN to have the same deal when they migrated to be Stackpath. Everyone who had worked on the deal with MaxCDN had left and left no record of that deal, so it was taking them a bit longer to work out than expected – MaxCDN used to offer free hosting to OSS projects, but Stackpath wouldn't be doing that. They were working on an exception for us, but any emails they send to our Twitter emails now get bounced, so I guess they could've gotten this sorted out before shutting down MaxCDN... but it's not like we're there on the other end to make it happen, so it appears we'll never know.

Anyway,現在是至少讓 twemoji.maxcdn.com 恢復到「會動」了,但情況變得很特殊,twemoji.maxcdn.com 這個網址目前是指到競業的 BunnyCDN 上:

;; ANSWER SECTION:
twemoji.maxcdn.com.     229     IN      CNAME   twemoji.b-cdn.net.
twemoji.b-cdn.net.      35      IN      A       23.248.177.58

但如果你真的打過去要檔案,會是 301 到 jsDelivr 上:

$ http https://twemoji.maxcdn.com/2/svg/1f525.svg
[...]
Location: https://cdn.jsdelivr.net/npm/twemoji@11.3.0/2/svg/1f525.svg
[...]

而 jsDelivr 目前是放在 Fastly 上:

;; ANSWER SECTION:
cdn.jsdelivr.net.       180     IN      CNAME   jsdelivr.map.fastly.net.
jsdelivr.map.fastly.net. 30     IN      A       199.232.45.229

但在 GitHub 的說明上面則是建議用 UNPKG

<script src="https://unpkg.com/twemoji@latest/dist/twemoji.min.js" crossorigin="anonymous"></script>

而 UNPKG 目前的 CDN 的部份則是 Cloudflare

;; ANSWER SECTION:
unpkg.com.              86400   IN      NS      anirban.ns.cloudflare.com.
unpkg.com.              86400   IN      NS      aron.ns.cloudflare.com.
;; ANSWER SECTION:
unpkg.com.              300     IN      A       104.16.126.175
unpkg.com.              300     IN      A       104.16.123.175
unpkg.com.              300     IN      A       104.16.122.175
unpkg.com.              300     IN      A       104.16.124.175
unpkg.com.              300     IN      A       104.16.125.175

指到 BunnyCDN 但是 BunnyCDN 只負責 redirect,然後也不是導到 UNPKG 上...

Netflix 單機 800Gbps 伺服器所使用的最佳化技巧

Hacker News 上看到 Netflix 的人丟出來的投影片,試著了解 Netflix 的 Open Connect Appliances 裡與 FreeBSD 相關的最佳化技巧對於效能的影響:「The “other” FreeBSD optimizations used by Netflix to serve video at 800Gb/s from a single server」。

看起來這邊的分析是先基於 400Gbps 的版本,可以跑到 375Gbps (53% CPU),接著在上面拔掉各種最佳化的設定,看看會掉多少流量。這邊可以參考先前在「Netflix 在單機服務 400Gbps 的影音流量」提到的資料。

投影片上的第一章是 sendfile 與 kTLS 相關的最佳化,這邊可以看出來都是重要的項目,隨便關掉一個就會掉很多 capacity:

  • Disable kTLS (and async sendfile) + nginx aio:40Gbps (100% CPU)
  • Disable kTLS (and async sendfile) + nginx thread pools:90Gbps (90% CPU)
  • Disable sendfile (but use kTLS):75Gbps (80% CPU)
  • Disable sendfile (but use NIC kTLS):95Gbps (80% CPU)
  • Enable Sendfile & kTLS, but disable ISA-L crypto:180Gbps (80% CPU)
  • Enable Sendfile & kTLS:240Gbps (80% CPU)

第二章是 virtual memory,UMA VM Page Cache 這邊看起來最明顯,SF_NOCACHE 也是個重要的項目:

  • Disable UMA VM Page Cache:60Gbps (95% CPU)
  • Disable VM Batch Queues:280Gbps (95% CPU)
  • Disable SF_NOCACHE:120Gbps (55% CPU)

另外第二章特別提到了一個之前沒有用到的 optimization,是把 arm64 上面的 4KB Pages 變成 16KB Pages,這帶動了些許的效能提昇,並且降低了 CPU 使用率:

345Gb/s @ 80% CPU -> 368Gb/s @ 66% CPU

第三章是 network stack,看起來 TSO 帶來的效益也是很高:

  • Disable TCP Large Receive Offload:330Gbps (65% CPU)
  • Disable RSS accelerated LRO:365Gbps (70% CPU)
  • TSO Disabled:180Gbps (85% CPU)
  • Disable TSO and LRO:170Gbps (85% CPU)

最後面則是有提到從 400Gbps 到 800Gbps 還多做了那些事情,最後是達到 731Gbps。

用的機器是 Dell PowerEdge R7525,這是一台 2U 的機器啊...

Netflix 在 2013 年 Open Connect Appliances

Reddit 上的原文在「So I got a Netflix cache server...」這邊,但看起來作者自己刪掉內容了 (可能是被接觸要求刪掉?),可以看 Internet Archive 上的「20221026080226」,以及報導「How a Redditor Ended Up With an Industrial-Grade Netflix Server」。

所以是 Netflix 退役的機器,看起來適合法取得的?

I work for a large ISP, and we are retiring/replacing quite a few 2013 era Netflix OCA caches, and I was offered one. Of course, I couldn't say no 😅

資料當然是被清過的:

I knew that Netflix had wiped them all in the decommissioning process, that they ran FreeBSD, that they were crammed full of drives, and that's about it.

然後這台 2013 年的機器以現在的角度來看也算很大台,尤其是看到硬碟的部份是 36 顆 HGST 的 7.2TB 硬碟:

36x 7.2TB 7200RPM HGST's

再加上 4 個 10Gbps 的界面可以接:

One 4x 10G SFP+ NIC

作者後來裝了 TrueNAS 來用,就這些規格資料看起來的確是個很適合當 NAS:

I expected some resistance when trying to install an OS, but it was already set to boot from USB and took a TrueNAS install like a champ!

但不確定會吃多少電,放在家裡用還是得考慮這點... 不然就是當紀念品收起來。

這兩個禮拜爆紅的 Stable Diffusion

Stable DiffusionStability AI 訓練出來的 model,跟之前提到的 DALL-E 最大的差異就是產生出的圖的限制少很多:

Unlike competing models like DALL-E, Stable Diffusion is open source and does not artificially limit the images it produces, though the license prohibits certain harmful use cases.

這也造就了這兩個禮拜整個 Stable Diffusion 的各種應用急速成長。

Simon Willison 的「Stable Diffusion is a really big deal」這篇來當作總覽還不錯。

除了授權使用上的限制以外,在技術上的限制也比較少 (有很大一部分會歸功於社群的各種 porting),包括了:

除了先前大家已經熟悉的 txt2img 功能以外,Stable Diffusion 另外提供了 img2img 的能力,也就是先給一張圖,然後再給對應的句子要求 Stable Diffusion 去改這張圖,所以就會有像是把這張圖:

加上「A distant futuristic city full of tall buildings inside a huge transparent glass dome, In the middle of a barren desert full of large dunes, Sun rays, Artstation, Dark sky full of stars with a shiny sun, Massive scale, Fog, Highly detailed, Cinematic, Colorful」的句子後,提供了這張圖:

以及這張圖:

這樣可玩性又多了不少...

白宮宣佈由政府資助的研究,都必須馬上公開

一樣是 Hacker News 上看到的:「Guidance to make federally funded research freely available without delay (whitehouse.gov)」,白宮的公告在「OSTP Issues Guidance to Make Federally Funded Research Freely Available Without Delay」這邊。

開頭有重點,不得限制以及收費。所以 paywall 是一定不行,另外要註冊才能看也算是一種限制,應該也會被這次的政策要求改善:

In a memorandum to federal departments and agencies, Dr. Alondra Nelson, the head of OSTP, delivered guidance for agencies to update their public access policies as soon as possible to make publications and research funded by taxpayers publicly accessible, without an embargo or cost.

時間表的部份,短期是 2023 年中更新 policy,並且在 2025 年年底前全部施行:

In the short-term, agencies will work with OSTP to update their public access and data sharing plans by mid-2023. OSTP expects all agencies to have updated public access policies fully implemented by the end of 2025.

這次的算政府方面的政策,至少這些論文會有地方可以公開下載。

找了一下之前寫下來跟 open access 有關的消息,從學校方面給壓力的也不少,不過我記錄下來的主要都是跟 Elsevier 的中止合約:

看起來不同角度都有一些推進...

從 Android (AOSP) fork 出來的 /e/

上個禮拜在 Hacker News 看到的「Review of /e/ – An Android Alternative For Mobile Phones」,在講 /e/ 這個從 AOSP 改出來的作業系統,主力在於「unGoogled」這件事情,避免任何資料傳回給 Google。Hacker News 上對應的討論在「Review of /e/ – Android-based alternative for mobile phones (thenewleafjournal.com)」這邊。

先看了一下運作方式,/e/ 的後面是 e Foundation,以非營利組織的方式經營。

LineageOS 的經驗來看,看起來有蠻多東西預先包好了,像是預掛了 microG 來模擬 Google Play Services 的服務與 API,這樣可以讓一些需要 Google Play Services 的服務可以跑 (但可以預期不會是完全相容)。

另外也有一些商業合作,所以市場上可以買到出廠就已經安裝 /e/ 的手機,讓一般使用者更容易上手。另外一條可以預期的路是自己刷 /e/,從「Smartphone Selector」這邊可以看到 /e/ 支援很多型號。

文章裡另外題到了其他的 AOSP fork,走不同的路線:

In addition to LineageOS, there are two forks focused primarily on security – GrapheneOS and CalyxOS. There is also Replicant, which appears to mostly support older devices at this time.

看起來弄個 Pixel 5a 或是舊一點的 Pixel 4a 應該是個還可以的方向,Google 自家牌的手機通常都是這些 distribution 優先支援的機種...

Netflix 的 Open Connect 機器往 800Gbps 推進

2021 年的時候曾經提過 Netflix 試著用單機推出 400Gbps 的流量 (用在 Netflix 的 Open Connect):「Netflix 在單機服務 400Gbps 的影音流量」,快一年後的目前,Netflix 的人已經成功推到接近 800Gbps 了:「Serving Netflix Video Traffic at 800Gb/s and Beyond」。另外在 Hacker News 上的討論「Serving Netflix Video Traffic at 800Gb/s and Beyond [pdf] (nabstreamingsummit.com)」也可以看看。

翻了一下投影片,最後衝到 720Gbps,主要是因為 NIC output drop,而非其他部份。

裡面還是把之前的故事也都講了一遍 (不然簡報的時間會太短?),如果有看過前面的內容可以快速看一下就好,這次新的東西從 page 89 開始:

  • Asynchronous Sendfile (2014)
  • Kernel TLS (2016)
  • Network-centric NUMA (2019)
  • Inline Hardware (NIC) kTLS (2022)
  • 800G initial results

最後面幾張投影片裡面有提到往 800Gbps 衝的硬體平台:

  • AMD (EPYC 7713 CPUs)
  • Dell (PowerEdge R7525)
  • Mellanox/NVIDIA (ConnectX-6 Dx NICS)
  • Intel (P5316 NVME)

下個目標不知道是什麼,看起來目前已經壓榨到 memory bandwidth 也有點極限的感覺了...

跑在本機的 GitHub Copilot 替代品

Hacker News 上看到「FauxPilot – an attempt to build a locally hosted version of GitHub Copilot (github.com/moyix)」這個本機上跑 GitHub Copilot 協定的專案。專案的 GitHub 在「FauxPilot - an open-source GitHub Copilot server」這邊。

裡面用的是 Salesforce 放出來的 CodeGen,不過 Salesforce 提供了 350M、2B、6B 與 16B 的 model,但在 FauxPilot 這邊目前只看到 350M、6B 與 16B 的 model 可以用,少了 2B 這組,然後需要的 VRAM 就有點尷尬了:

[1] codegen-350M-mono (2GB total VRAM required; Python-only)
[2] codegen-350M-multi (2GB total VRAM required; multi-language)
[3] codegen-6B-mono (13GB total VRAM required; Python-only)
[4] codegen-6B-multi (13GB total VRAM required; multi-language)
[5] codegen-16B-mono (32GB total VRAM required; Python-only)
[6] codegen-16B-multi (32GB total VRAM required; multi-language)

13GB 剛好超過 3080 Ti 的 12GB,所以不是 3090 或 3090 Ti 的使用者就只能跑 350M 這個版本?看 Hacker News 上的討論似乎是有打算要弄 2B 的版本啦...

然後我自己雖然是 11GB 的 1080 Ti,想跑個 350M 的版本測試看看,但看起來相關的 Nvidia driver 沒裝好造成他識別不到,加上我是用 neovim,看了一下目前 ~/.config/github-copilot/hosts.json 的內容,程式碼應該是寫死到 GitHub API 上使用:

{"github.com":{"user":"gslin","oauth_token":"x"}}

先暫時放著好了,晚點等 2B 版本出現後再回來看看有沒有比較完整的指示...