可以用 IAM 控管的 AWS Instance Connect (透過 SSH 機制)

除了本來的 Systems Manager 可以在 EC2 的機器上開 shell 管理外,現在 AWS 推出了 EC2 Instance Connect,可以直接綁 IAM 的權限管理:「Introducing Amazon EC2 Instance Connect」。

With EC2 Instance Connect, you can control SSH access to your instances using AWS Identity and Access Management (IAM) policies as well as audit connection requests with AWS CloudTrail events. In addition, you can leverage your existing SSH keys or further enhance your security posture by generating one-time use SSH keys each time an authorized user connects. Instance Connect works with any SSH client, or you can easily connect to your instances from a new browser-based SSH experience in the EC2 console.

除了記錄外,也包含了一些安全機制,像是可以選擇一次性的帳號... 跟先前的 Systems Manager 比起來,主要是能用習慣的 terminal software 還是比較爽?

RDBMS 裡的各種 Lock 與 Isolation Level

來推薦其他人寫的文章 (雖然是在 Medium 上...):「複習資料庫的 Isolation Level 與圖解五個常見的 Race Conditions」、「對於 MySQL Repeatable Read Isolation 常見的三個誤解」,另外再推薦英文維基百科上的「Snapshot isolation」條目。

兩篇文章都是中文 (另外一個是英文維基百科條目),就不重複講了,這邊主要是拉條目的內容記錄起來,然後寫一些感想...

SQL-92 定義 Isolation 的時候,技術還沒有這麼成熟,所以當時在訂的時候其實是以當時的技術背景設計 Isolation,所以當技術發展起來後,發生了一些 SQL-92 的定義沒那麼好用的情況:

Unfortunately, the ANSI SQL-92 standard was written with a lock-based database in mind, and hence is rather vague when applied to MVCC systems. Berenson et al. wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable transactions.

其中一個就是 Snapshot Isolation,近代的資料庫系統都用這個概念實做,但實際上又有不少差別...

另外「Jepsen: MariaDB Galera Cluster」這篇裡出現的這張也很有用,裡面描述了不同層級之間會發生的問題:

這算是當系統有一點規模時 (i.e. 不太可能使用 SERIALIZABLE 避免這類問題),開發者需要了解的資料庫限制...

AWS 提供 VPC Traffic Mirroring 的功能

以前在機房可以在 switch 上用 port mirror 看流量內容找問題,現在在 AWS 上也提供類似的功能 VPC Traffic Mirroring:「New – VPC Traffic Mirroring – Capture & Inspect Network Traffic」。

所以所有以前在傳統機房使用 switch 的技術,都可以在 AWS 上重新發展出來,所以不算太意外的是第一波就有一堆 partner 提供服務,或是一些公司提供經驗。

另外 AWS 的 VPC Traffic Mirroring 比以前 switch 的 port mirror 更彈性,可以把整個網路當來源,或是指定特定的 ENI 當來源:

Mirror Source – An AWS network resource that exists within a particular VPC, and that can be used as the source of traffic. VPC Traffic Mirroring supports the use of Elastic Network Interfaces (ENIs) as mirror sources.

然後除了可以打到 ENI 上,也可以打到 NLB 上:

Mirror Target – An ENI or Network Load Balancer that serves as a destination for the mirrored traffic. The target can be in the same AWS account as the Mirror Source, or in a different account for implementation of the central-VPC model that I mentioned above.

不免俗的,可以過濾封包:

Mirror Filter – A specification of the inbound or outbound (with respect to the source) traffic that is to be captured (accepted) or skipped (rejected). The filter can specify a protocol, ranges for the source and destination ports, and CIDR blocks for the source and destination. Rules are numbered, and processed in order within the scope of a particular Mirror Session.

然後有判斷 session 的能力 (看這邊的敘述,應該就是指 stateful connection?):

Traffic Mirror Session – A connection between a mirror source and target that makes use of a filter. Sessions are numbered, evaluated in order, and the first match (accept or reject) is used to determine the fate of the packet. A given packet is sent to at most one target.

而且這一次公佈就幾乎開放所有區域了,費用看起來也不太貴:

VPC Traffic Mirroring is available now and you can start using it today in all commercial AWS Regions except Asia Pacific (Sydney), China (Beijing), and China (Ningxia). Support for those regions will be added soon. You pay an hourly fee (starting at $0.015 per hour) for each mirror source; see the VPC Pricing page for more info.

Apache Software Foundation 走掉一批人...

在「Changes at the Apache Software Foundation」這邊看到的消息,ASF 突然走了三位都超過十年的資深成員,其中一位還是 ASF 的共同創辦人。關於 ASF 官方的公告,可以在「Statement by The Apache Software Foundation Board of Directors」這邊讀到。

如同 LWN 說的,為什麼三個人突然同時離開的狀況還不清楚:

There is no indication of why all these people decided to leave at the same time.

Ubuntu 改變放掉 i386 的計畫

先前在「Ubuntu 19.10 要放掉 i386 架構」這邊提到 Ubuntu 要放掉 i386 的計畫,因為造成的迴響很大,現在官方決定修改本來的結論:「Statement on 32-bit i386 packages for Ubuntu 19.10 and 20.04 LTS」。

在本來的計畫裡,是完全放生 i386 架構 (完全不管):

While this means we will not provide 32-bit builds of new upstream versions of libraries, there are a number of ways that 32-bit applications can continue to be made available to users of later Ubuntu releases, as detailed in [4]. We will be working to polish the 32-bit support story over the course of the 19.10 development cycle. To follow the evolution of this support, you can participate in the discourse thread at [5].

現在則是打算透過 container 技術支援 32-bit library & binary,算是某種緩衝方式:

We will also work with the WINE, Ubuntu Studio and gaming communities to use container technology to address the ultimate end of life of 32-bit libraries; it should stay possible to run old applications on newer versions of Ubuntu. Snaps and LXD enable us both to have complete 32-bit environments, and bundled libraries, to solve these issues in the long term.

但應該還是會有程式沒辦法在 container 環境裡跑,看起來官方決定放掉了...

Raspberry Pi 4 開賣,USD$35

Raspberry Pi 4 開賣,目前推出的是 Model B,最低規格的 1GB RAM 版本與之前 RPi 3 相同都是 USD$35,另外這次提供了以前沒有的 2GB 與 4GB 版本,分別是 USD$45 與 USD$55:「Raspberry Pi 4 on sale now from $35」。

有兩個比較大而且要注意的改變,第一個是電源接頭變成 Type-C 了:

We’ve moved from USB micro-B to USB-C for our power connector. This supports an extra 500mA of current, ensuring we have a full 1.2A for downstream USB devices, even under heavy CPU load.

另外一個是 HDMI 接頭換成 D Type:

To accommodate dual display output within the existing board footprint, we’ve replaced the type-A (full-size) HDMI connector with a pair of type-D (micro) HDMI connectors.

這兩個改變都需要準備額外的設備或是線材,其他的改變應該還好...

這次改版最明顯的是效能的增加,官方宣稱 CPU 速度是前一代的三倍:

A 1.5GHz quad-core 64-bit ARM Cortex-A72 CPU (~3× performance)

而在 GPU 相關的則是對 H.265 的 4Kp60 硬體解壓縮:

4Kp60 hardware decode of HEVC video

先前用 RPi 3 放客廳還是覺得不太夠,也許可以考慮看看這個...

Microsoft 釋出一個效能極佳的 Malloc Library

Hacker News 上看到的:「Mimalloc – A compact general purpose allocator with excellent performance (github.com)」,專案網站在 GitHub 上:「mimalloc is a compact general purpose allocator with excellent performance.」。

現代伺服器軟體比較常用的應該是 jemalloc,許多軟體也直接把 jemalloc 包進去 (像是 Firefox),所以各種測試主要就是看與 jemalloc 的差異。

測試的機器有點怪,既然都是在 AWS 上測試,卻不是選目前主流的 Intel,而且就算是 AMD 也不是選最大台的 r5a.24xlarge (都是在去年 2018 年十一月發表的):

Testing on a big Amazon EC2 instance (r5a.4xlarge) consisting of a 16-core AMD EPYC 7000 at 2.5GHz with 128GB ECC memory, runningUbuntu 18.04.1 with LibC 2.27 and GCC 7.3.0.

尋著留言有看到 daanx/mimalloc-bench 這邊有使用 Intel 平台的測試,也有類似的結果,所以應該是還 ok...

Anyway,依據目前官方給的的效能測試,看起來頗不賴:

而且實際的記憶體用量也比以前少,這邊在看資料時官方有附註,有些測試在這邊因為執行方式所以不會準:

(note: the xmalloc-testN memory usage should be disregarded is it allocates more the faster the program runs).

至於實際上是不是真的在 general purpose 都可以提昇效能,應該會等著比較大的社群玩看看... 尤其是 Percona 對只要換個 library 就能提昇效能的東西,他們基本上都不會放棄嘗試...

用 YouTube 上的影片查發音

Improve your English pronunciation using Youtube」這個服務利用 YouTube 上的影片與字幕提供界面,讓你可以知道現實世界的人怎麼發音的查詢系統。

系統本身不難做,主要是去撈大量資料,然後建立 search engine 提供,idea 與執行才是這個服務的賣點。

拿到後第一個想到的就是,一定要拿來查一下「IKEA」怎麼唸 XDDD

除了英文以外還可以查其他語言,包括中文...

Cloudflare 弄了 time.cloudflare.com,不過 latency 沒有很好...

Cloudflare 提供的 NTP service,使用 time.cloudflare.com:「Introducing time.cloudflare.com」。

官方是號稱所有的機房 (應該就包括台北的點):

Now, anyone can get time securely from all our datacenters in 180 cities around the world.

但在 HiNet 下測試可以看到是從東京的點服務:

  2.|-- snuh-3302.hinet.net        0.0%    10    9.5  10.5   9.5  12.7   0.8
  3.|-- tpdt-3022.hinet.net        0.0%    10   11.1  10.8  10.1  11.8   0.3
  4.|-- r4103-s2.tp.hinet.net      0.0%    10   27.2  11.9   9.3  27.2   5.4
  5.|-- r4003-s2.tp.hinet.net      0.0%    10   11.3  10.7   9.5  11.6   0.3
  6.|-- xe-0-0-0-3-6.r02.osakjp02  0.0%    10   47.9  48.7  47.6  49.9   0.6
  7.|-- ae-2.a00.osakjp02.jp.bb.g  0.0%    10   44.9  47.3  42.8  66.4   6.9
  8.|-- ae-20.r03.osakjp02.jp.bb.  0.0%    10   43.7  43.2  42.5  44.7   0.3
  9.|-- 61.120.144.46              0.0%    10   69.6  52.4  48.0  69.6   6.8
 10.|-- 162.159.200.123            0.0%    10   48.3  48.8  47.9  49.3   0.0

如果從 APOL (有線電視) 的點則是透過台灣的機房連線:

  3.|-- 10.251.11.6                0.0%    10   19.8  29.6  11.1  80.2  21.7
  4.|-- 10.251.231.5               0.0%    10   25.3  24.9  16.4  43.6   8.0
  5.|-- 10.251.231.1               0.0%    10    8.7   6.3   3.4   9.1   1.9
  6.|-- 10.251.230.34              0.0%    10    4.6   7.4   2.6  14.5   3.2
  7.|-- 10.251.230.29              0.0%    10    3.0   6.6   3.0  11.9   2.8
  8.|-- 202-178-245-162.cm.static  0.0%    10    4.8   7.3   4.8   9.9   1.4
  9.|-- 192.168.100.14             0.0%    10    7.3   8.0   5.4  11.1   2.0
 10.|-- 192.168.100.9              0.0%    10    8.2   8.0   4.9  11.1   1.7
 11.|-- 203-79-250-65.static.apol  0.0%    10    9.1   9.0   6.4  11.3   1.5
 12.|-- 211.76.96.92               0.0%    10    7.8   8.1   4.3  15.8   3.3
 13.|-- 39-222-163-203-static.tpi  0.0%    10    9.9   9.1   6.7  12.4   1.8
 14.|-- 162.159.200.1              0.0%    10    5.0   7.2   5.0   9.7   1.4

看起來又是有東西沒搞定了...

Instagram 改善影片上架速度的方式

不是什麼魔法,其實是改產品面上的規格 (但是發表到 Instagram Engineering 上):「Video Upload Latency Improvements at Instagram」。

最原始的版本是所有的格式都轉完後才可以上架:

然後把規格改成最高畫質的版本轉完後就可以先上架:

The idea is, instead of blocking until all video versions are available, we can publish the video once the highest-quality video version is available.

然後是把影片切段上傳,所以傳一半就可以先處理一半,變成 pipeline 的概念,但增加程式的複雜度,以及被迫要調整影片品質的參數:

Segmented uploads reduce upload latency in many cases but come with a few tradeoffs. For instance, segmented uploads increase the complexity of the pipeline. There are some quality metrics that are only available per segment at transcode time, such as SSIM. These metrics are not helpful to us on a per segment basis. Therefore, we need to do a duration weighted average of the SSIM of all segments to come up with the SSIM of the whole video. Similarly, handling exceptions is more complex since there are more cases to handle.

另外有一種特例是上傳的影片本身就已經符合伺服器的規格,這樣的話可以直接放行 (不過這樣不會有 security concern 嗎...):

Another performance optimization we use to improve the upload latency and save CPU utilization is something we call a “passthrough” upload. In some cases, the media that is uploaded is already ready for playback on most devices.

都是想的出來而且會帶有 tradeoff 的方法,而不是完全正面的改善 :o