觀察誰在存取剪貼簿的工具 (X11 下)

兩個月前在 Hacker News 上看到的討論,有人想要知道誰在 X11 下存取剪貼簿:「Who keeps an eye on clipboard access? (ovalerio.net)」,原文在「Who keeps an eye on clipboard access?」這邊,作者用 Python 寫的程式則是在「clipboard-watcher」這邊。

馬上有想到 iOS 在 2020 年推出的機制:「iOS 14 clipboard notifications are annoying, but developer adoption of a new API will improve the experience」。

不過在 X11 上跑起來會發現冒出來的資訊量有點大,像是在瀏覽器操作 WordPress 寫文章時剪剪貼貼的時候就會狂噴,如果可以提供程式的白名單的話就更好了,畢竟是我直接把 clipboard API 裡讀取的功能直接拔掉 (但網站還是可以寫進去就是了),對我來說不會在意 browser 寫進去的情況:

另外程式有時候會卡住 (尤其是遇到圖片的剪輯時),算是 bug 吧...

然後 Hacker News 的討論串裡面有人提到一個有趣的設計,他希望限制那些不在焦點上面的程式去碰 clipboard:

By far the worst offense I've seen in clipboard privacy on the Linux desktop is RedHat's virt-manager. It sends your clipboard AND selection content to all virtual machines, even when they are not focused, with no indication that it's happening, and with no GUI option to turn it off. This is at odds with the common practice of running untrusted code in virtual machines.

這個想法好像不賴,理論上 clipboard 應該是在有互動的時候才會碰到的東西...

Amazon EFS 效能提昇的一些討論

上一篇「Amazon EFS 的效能提昇」提到 Amazon EFS 的效能提昇,在 Hacker News 上看到 Amazon EFS 團隊的 PMT (Product-Manager-Technical) 出來回一些東西:「Amazon Elastic File System Update – Sub-Millisecond Read Latency (amazon.com)」,搜尋 geertj 應該就可以看到他回的東西了...

像是即使是 Jeff Barr 發表這篇文章,也還是經過 legal team 的同意才能發表:

(PMT on the EFS team).

Yes, the wordings are carefully formulated as they have to be signed off by the AWS legal team for obvious reasons. With that said, this update was driven by profiling real applications and addressing the most common operations, so the benefits are real. For example, a simple WordPress "hello world" is now about 2x as fast as before.

另外這次的效能提昇是透過 cache 層達成的:

I'm the PMT for this project in the EFS team. The "flip the switch" part was indeed one of the harder parts to get right. Happy to share some limited details. The performance improvement builds on a distributed consistent cache. You can enable such a cache in multiple steps. First you deploy the software across the entire stack that supports the caching protocol but it's disabled by configuration. Then you turn it for the multiple components that are involved in the right order. Another thing that was hard to get right was to ensure that there are no performance regressions due to the consistency protocol.

然後在每個 AZ 都有 cache:

The caches are local to each AZ so you get the low latency in each AZ, the other details are different. Unfortunately I can't share additional details at this moment, but we are looking to do a technical update on EFS at some point soon, maybe at a similar venue!

另外看起來主要就是 metadata cache 的幫助:

NFS workloads are typically metadata heavy and highly correlated in time, so you can achieve very high hit rates. I can't share any specific numbers unfortunately.

還是有很多細節數字不能透漏,但知道是透過 cache 達成的就已經可以大致上想像後面是怎麼弄出來的了...

Amazon EFS 的效能提昇

AWS 宣佈他們將 Amazon EFS 的 latency 大幅降低以提昇效能:「Amazon Elastic File System Update – Sub-Millisecond Read Latency」。

Linux 上一般是用 NFS 掛 EFS,個位數的 ms 的確對於效能影響超大,現在宣稱讀取的部份降到 0.6ms,應該會有蠻明顯的感覺:

Up until today, EFS latency for read operations (both data and metadata) was typically in the low single-digit milliseconds. Effective today, new and existing EFS file systems now provide average latency as low as 600 microseconds for the majority of read operations on data and metadata.

然後不另外收費:

This performance boost applies to One Zone and Standard General Purpose EFS file systems. New or old, you will still get the same availability, durability, scalability, and strong read-after-write consistency that you have come to expect from EFS, at no additional cost and with no configuration changes.

另外就是過去幾個禮拜他們把現有的 EFS 都轉移過去了:

We “flipped the switch” and enabled this performance boost for all existing EFS General Purpose mode file systems over the course of the last few weeks, so you may already have noticed the improvement. Of course, any new file systems that you create will also benefit.

不過 EFS 另外一個問題就是貴炸,用錢換方便...

其他非 Google 的 Email Hosting 服務

先前在「Cloudflare 的 Email Routing 功能可以讓一般人用 (包括免費帳號)」這邊有提到,因為 Google Workspace 要廢掉免費版本,所以 Cloudflare 推出了 E-mail forwarding 的方案讓大家用,所以一種簡單的解決方法就是註冊另外一個免費的 Gmail 帳號,然後用 Cloudflare 把信件 forward 到新的帳號裡。

但如果想要避開 Google 的服務的話,在 Hacker News 上的「Ask HN: Best hosted alternative to Google Workspace for email?」這邊就有討論不少可能的替代方案,主要是付費的為主。

討論裡常被提起的方案:

我自己是很愛 Fastmail,他的 webmail 界面速度很快 (即使 www.fastmail.com 需要連到美國的伺服氣上),而且鍵盤快速鍵與 Gmail 幾乎相同,所以學習成本很低,就把其中一個 domain 掛上去用,然後把一些服務的註冊 email 都往這邊掛。

而費用的部份,除了可以月繳外,年繳與三年繳都有另外再給 discount (30GB 版本是 US$5/mo、US$50/y 與 US$140/3y),第一次用一年後,後面就是刷三年了...

如果有在用 Office 的話,Microsoft 365 應該也是個選擇方案,不過我對 Outlook 的 folder-based 的設計邏輯用不慣...

然後是 iCloud+,看討論裡面提到穩定度似乎不太行,但也有人用的好好的,我自己是只有拿來同步 Apple 裝置用,沒有用到郵件服務的部份...

另外是自己有架設 Email service,話說這年頭自己搞 Email system 超麻煩的,對於沒搞過的人 SPFDKIM 加上 DMARC 絕對會搞死自己,乖乖丟雲端服務比較省事,我純粹是要保持多路暢通而已...

為資料庫提案新的 UUID 格式

前幾天在 Hacker News Daily 上看到的東西,今年四月的時候有人針對資料庫提案新的 UUID 格式:「New UUID Formats – IETF Draft (ietf.org)」。

在 draft 開頭有說明這個提案的目標:

This document presents new time-based UUID formats which are suited for use as a database key.

A common case for modern applications is to create a unique identifier for use as a primary key in a database table. This identifier usually implements an embedded timestamp that is sortable using the monotonic creation time in the most significant bits. In addition the identifier is highly collision resistant, difficult to guess, and provides minimal security attack surfaces. None of the existing UUID versions, including UUIDv1, fulfill each of these requirements in the most efficient possible way. This document is a proposal to update [RFC4122] with three new UUID versions that address these concerns, each with different trade-offs.

另外在 Hacker News 上有人整理出來,可以直接理解提案所提出的新格式是什麼:

A somewhat oversimplified summary of the new UUID formats:

UUID6: a timestamp with a weird epoch and 100 ns precision like in UUID1, but in a big-endian order that sorts naturally by time, plus some random bits instead of a predictable MAC address.

UUID7: like UUID6, but uses normal Unix timestamps and allows more timestamp precision.

UUID8: like UUID7, but relaxes requirements on where the timestamp is coming from. Want to use a custom epoch or NTP timestamps or something? UUID8 allows it for the sake of flexibility and future-proofing, but the downside is that there's no standard way to parse the time from one of these -- the time source could be anything monotonic.

這在不同的 storage engine 上面會有不同的討論,這邊先討論 MySQL 系列的 InnoDB,至於 PostgreSQL 的 engine 以及其他資料庫系統,就另外讓更熟悉的人討論了。

InnoDB 採用了 clustered index (可以參考「Database index」這邊的說明),也就是資料本體是以某種定義的大小順序存放。

在 InnoDB 裡面則是用 primary key 的順序來存放資料 (沒有指定 primary key 時會有 fallback 行為),其他的 unique key 與 index key 則是指到 primary key,所以你可以看到 primary key 的大小也會影響到其他的 index key。

所以 128 bits 的 UUID 在大型的 MySQL ecosystem 實在不怎麼受歡迎,在 2010 年的時候 FlickrTwitter 都有發表過 ticket system:「Ticket Servers: Distributed Unique Primary Keys on the Cheap」、「Announcing Snowflake」,兩個系統有不同的需求,但都是產生 64 bits 的 unique id。

其中 Flickr 的系統算是很簡單的,沒有要保證時間順序 (i.e. 先取的號碼一定比較小,以及後取的號碼一定比較大),就用兩台 MySQL 跑 active-active 架構,然後錯開產生的值:

TicketServer1:
auto-increment-increment = 2
auto-increment-offset = 1

TicketServer2:
auto-increment-increment = 2
auto-increment-offset = 2

到現在還是一個蠻簡單的解法...

EC2 總算支援透過 Serial Console 操作了...

以往 Amazon EC2 的機器爛到開不起來時只能「看」到 Console 的輸出,然後要把 root volume 掛到其他機器上修正,接著再掛回來 (然後沒修好就要再重複...),現在總算可以透過 EC2 Serial Console 來操作了:「Troubleshoot Boot and Networking Issues with New EC2 Serial Console」。

不過裡面有一些限制,首先機器必須是基於 AWS Nitro System,這個部份在「Amazon EC2 Instance Types」這邊可以翻到是不是 Nitro,比較新的 family type 應該都是 (像是 t3/t3a/t4g 都是 Nitro,但 t2 不是):

EC2 Serial Console access is available for EC2 instances based on the AWS Nitro System. It supports all major Linux distributions, FreeBSD, NetBSD, Microsoft Windows, and VMWare.

另外是支援的區域目前只有幾個主力區域,不過公司在用的主力區 (新加坡) 也進去了,之後遇到問題的時候可以測試看看:

  • US East (N. Virginia), US West (Oregon), US East (Ohio)
  • Europe (Ireland), Europe (Frankfurt)
  • Asia Pacific (Tokyo), Asia Pacific (Sydney), Asia Pacific (Singapore)

Twitter 上看到有些人有提到「總算啊...」之類的感想...

Mattermost 推出可以自己架設的專案管理工具 Focalboard

Mattermost 推出了可以自己架設專案管理的工具 Focalboard,在 Hacker News 上可以看到討論:「Focalboard – a self-hosted alternative to Trello, Notion, and Asana (focalboard.com)」。

從界面上可以看出來抄 Notion 的專案部份,目前的功能還非常早期,沒有什麼整合能力... 然後從這個產品可以知道,這家公司應該就打算一直抄下去了 XD

GitHub 上的資料看起來是 GolangVue.js 寫的,設定裡看起來支援 PostgreSQLSQLite

預設的 telemetry 是開的 (在 config.json 裡面可以看到),建議可以關掉...

Dolt,本機開發測試用的 MySQL server

看到「Dolt is Git for Data!」這個專案,是個在本機上跑的 MySQL server,另外可以在上面的資料進行版本控制,看起來很適合本機開發測試。

首先抓下來可以看到沒幾個檔案 (這是 linux-amd64 版),也可以看到跟 Git 的關係:

$ tree
.
├── bin
│   ├── dolt
│   ├── git-dolt
│   └── git-dolt-smudge
└── LICENSES

然後用 bin/dolt sql-server -P 3307 -u root -p passw0rd 跑就可以把一個相容於 MySQL 的伺服器跑在 port 3307,然後用 mysql -h 127.0.0.1 --port 3307 -u root -p 就可以輸入密碼 passw0rd 登入進去:

$ mysql -h 127.0.0.1 --port 3307 -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.7.9-Vitess

可以從 Server version 看到專案是用了 Vitess 實做的 MySQL 界面。

另外測了一下,透過連線所做的變更 (像是 CREATE DATABASECREATE TABLE,以及 CRUD 中的 CUD) 是不會寫回磁碟裡的,嘗試了不同的設定,不管改什麼都是這樣,應該是故意設計成這樣。

在本機跑 test case 測試應該還不錯,會比 SQLite:memory: 更接近 MySQL 一些,不過在 CI 裡的話應該是可以直接把 MySQL 跑起來...

原來 Disqus 已經被網路廣告公司買走了...

在「Disqus, the dark commenting system」這邊才看到 Disqus 被網路廣告公司買的消息:

Disqus was acquired by an advertising company called Zeta Global in 2017. Obviously, advertising companies do everything to increase their revenue (Ex: the Big G).

引用的報導則是在「Zeta Global acquires commenting service Disqus」這邊可以看到,大約在三年前發生的事情...

基本上已經變成廣告追蹤平台了:

I analyzed the network requests log. Disqus makes HTTP requests to 11 different third-party domains through the browser. All of these websites are trackers/pixels (Even some were detected as malware by my security guard).

而且就算是付費會員也一樣,還是會追蹤:

When you provide a free product, money should come from somewhere. Disqus uses advertising for that. Now, I subscribed to a paid plan trial of Disqus to see if things change or not. No! Even in the paid plans, the same pixels are loaded on the client-side. Looks like there's no way to opt-out from tracking.

後面的推薦可以看看就好,自己架應該還是比較好的選擇... 用「open source comment system」搜有些東西可以參考。