Ubuntu 下面搞 Multi-home 架構

家裡的 internet 架構大概是這樣 (省略過其他裝置):

一邊是 HiNet 的線路直接接中華的數據機 (modem),這段是用 PPPoE 撥接;另一邊是第四台網路 (北都),另外上面寫的 Switch 應該是 IP 分享器 (一台 ASUS 的機子,刷 DD-WRT),作圖的時候寫錯了...

最後是電腦的部份,我的桌機是跑 Ubuntu,用兩張個不同的實體線路 (界面分別是中華的 enp4s0 與第四台的 enp6s0f0) 接到了這兩個不同的網段上面。

打算跑 source routing 的架構來善用兩邊的頻寬,想法上面大概是這樣拆解:

  1. 機器本身有個 192.168.3.x 的 static ip。
  2. 針對 source ip 是 192.168.3.x 的封包,預設會往 192.168.3.254 這台分享器丟,然後 NAT 出去。
  3. Squid 在本機上跑一個 proxy server,指定 source ip 是 192.168.3.x

有了這樣的架構,我就可以在瀏覽器上面就透過 SwitchyOmega 這類的套件,指定某些網段要走第四台的頻寬出去了。

另外可以指定 http proxy 的服務也可以透過這個方法往第四台的線路連出去。

其中第二點需要把 source ip 是 192.168.3.x 的封包丟到 192.168.3.254 這段需要一些設定,首先是需要設定一個獨立的 routing table,我是在 /etc/iproute2/rt_tables 裡面放:

2       second

然後因為我是透過 NetworkManager 在管理網路界面的,我希望在 enp6s0f0 啟動時自動設定這個 source routing 邏輯,所以我在 /etc/NetworkManager/dispatcher.d/99-enp6s0f0 這邊寫了:

#!/bin/bash

interface=$1
event=$2

if [[ "$interface" == "enp6s0f0" && "$event" == "up" ]]; then
    ip route add default via 192.168.3.254 table second
    ip rule add from 192.168.3.0/24 table second
fi

然後要記得把這個檔案 chmod 755 讓他可以執行。

接著是 Squid 的設定,在 /etc/squid/squid.conf 裡面這樣寫:

#
http_access allow all
#
access_log /var/log/squid/access.log squid
cache deny all
cache_dir null /tmp
cache_log /dev/null
cache_mem 8 MB
dns_v4_first on
forwarded_for off
http_port 3128
tcp_outgoing_address 192.168.3.x

其中最後的 192.168.3.x 換成自己的固定 IP address。這邊因為 traffic 基本上都是 HTTPS 了,也不需要開 cache,就這樣設定...

這邊比較特別的是 dns_v4_first 的設計,這個是讓 Squid 儘量用 IPv4 的位置連線。這是因為北都的網路沒有提供 IPv6 位置,所以如果網站的 DNS 如果有 IPv6 位置的話就會從 HiNet 這邊的 IPv6 出去了...

另外 ping 與 MTR 之類的工具不會動在這這樣的架構下是正常的,因為這些工具會自己組合 raw packet 丟,不是透過 Linux 的 network stack 處理,所以不會被我們指定的 ip rule 解析。網路上看起來是有方法可以 mitigate,但我就先放著了...

這樣看起來還算堪用,先這樣用一陣子看看... 先前是在 Raspberry Pi 上面跑個 proxy server 導流量,但會受限於 Raspberry Pi 的硬體限制,效能上面就普普通通,現在直接用桌機拼看看...

Starlink 想要在太空直接提供 5G 網路訊號讓地面手機使用

Hacker News 上看到 Starlink 打算跟 T-Mobile 合作,直接用衛星提供 5G 訊號讓地面手機使用:「SpaceX, T-Mobile to connect satellites to cellphones in remote areas (wsj.com)」,原報導在 WSJ 的「SpaceX, T-Mobile to Connect Satellites to Cellphones in Remote Areas」這邊,另外因為 paywall 的關係,可以在這邊讀。

會用二代衛星:

Mr. Musk said the service would use second-generation Starlink satellites that would be outfitted with large antennas that cover swaths of land that have no service. SpaceX has a pending application before the FCC to launch around 30,000 of the second-generation satellites over time.

在另外一篇報導「SpaceX and T-Mobile team up to use Starlink satellites to ‘end mobile dead zones’」裡面有提到更細一點,不是衛星電話,是目前一般的手機:

The service won’t require mobile users to get a new phone. Musk said in or after a natural disaster, even if all the cell towers are taken out, the planned service should work.

不過可以預期只會有很基本的服務 (大概確保通話與簡訊會通),針對緊急危難狀況會特別有幫助:

Mr. Musk said that the bandwidth would be limited and that the new satellite service wouldn’t supplant existing ground-based cellular services. “This is meant to provide basic coverage to areas that are completely dead,” he said.

翻了翻 wiki,目前 Starlink 第一代衛星的軌道高度是 340km 左右,好像還不確定二代衛星會在哪個高度...

Hacker News 上有蠻多人在算技術上的可行性,除了訊號強度外,衛星與地面相對速度比目前地面上的交通工具都快,都卜勒效應 (Doppler effect) 看起來也是個會影響很多的主題...

不過討論裡面有提到 2021 年就已經有其他商用公司在幹類似的事情,所以看起來不只是講講而已?應該是有些可能性:「Lynk demos global satellite connection for ordinary phones and prepares for commercial launch」。

網頁大小 14KB 與 15KB 的速度差異

Hacker News 上看到「Why your website should be under 14kB in size」這篇,對應的討論在「A 14kb page can load much faster than a 15kb page (endtimes.dev)」,在講網頁大小 14KB/15KB 的速度差異比 15KB/16KB 大很多的問題:

What is surprising is that a 14kB page can load much faster than a 15kB page — maybe 612ms faster — while the difference between a 15kB and a 16kB page is trivial.

原因是 TCP slow start 造成的:

This is because of the TCP slow start algorithm.

而網頁這邊 TCP slow start 目前大多數的實做都是 10 packets 後發動:

Most web servers TCP slow start algorithm starts by sending 10 TCP packets.

然後再組合 1500 bytes/packet 以及 overhead,就差不多是 14KB 了:

The maximum size of a TCP packet is 1500 bytes.

This this maximum is not set by the TCP specification, it comes from the ethernet standard

Each TCP packet uses 40 bytes in its header — 16 bytes for IP and an additional 24 bytes for TCP

That leaves 1460 bytes per TCP packet. 10 x 1460 = 14600 bytes or roughly 14kB!

然後 HTTP/3 也可以看到類似的設計 (出自「QUIC Loss Detection and Congestion Control」:

Sending multiple packets into the network without any delay between them creates a packet burst that might cause short-term congestion and losses. Implementations MUST either use pacing or limit such bursts to the initial congestion window, which is recommended to be the minimum of 10 * max_datagram_size and max(2* max_datagram_size, 14720)), where max_datagram_size is the current maximum size of a datagram for the connection, not including UDP or IP overhead.

算是一個小知識... 但對於現在肥滋滋的網頁效果來說就沒辦法了,而且考慮到大一點的網站會在一個 TCP 連線裡面可能會傳很多 request,其實早就超過 TCP slow start 的門檻了。

南韓最高法院也對 Web Scraping 給出了類似美國的判例

也是上個禮拜在 Hacker News 上看到的新聞,南韓最高法院對於 web scraping 也做出了類似美國 HiQ Labs v. LinkedIn 案的判例:「Korean Supreme Court Provides Clarity on Web Scraping and Violation of the Relevant Korean Laws, including the Copyright Act and Information Protection Act (Supreme Court, 2021Do1533, May 12, 2022)」,原文似乎已經被 paywall,但可以從 Internet Archive 的「這邊」與 archive.today 的「這邊」讀到原全文。另外在 Hacker News 上的討論「The Supreme Korean court says that scraping publicly available data is legal (lexology.com)」。

hiQ 的案子之前有寫過,可以參考「hiQ 爬 LinkedIn 資料的無罪判決」這邊。

南韓最高法院認為這次的抓取公開資料不違反南韓的法令:

On May 12, 2022, the Korean Supreme Court held in Case No. 2021Do1533 that scraping publicly available data from a competitor’s website does not violate the asserted laws, including the Copyright Act and the Act on Promotion of Information and Communications Network Utilization and Information Protection (“Information Protection Act”).

比較特別的是在講刑事責任的第三點提到了 read only,不確定是不是反駁原告的立論:

The Supreme Court also found Defendants not guilty of violating the Criminal Code in light of the following findings: (i) the Defendants’ act of scraping did not interfere with information processing on Yanolja’s API server; (ii) the Defendants’ act of scraping did not interfere with Yanolja’s booking business; (iii) the Defendants did not interfere with Yanolja’s business, because its scraping did not modify data within Yanolja’s API server; and (iv) the Defendants lacked mens rea for criminal interference with business, as the Defendants merely intended to collect accommodation information from Yanolja’s API server.

整體看起來是被告的大獲全勝?

Netflix 的 Open Connect 機器往 800Gbps 推進

2021 年的時候曾經提過 Netflix 試著用單機推出 400Gbps 的流量 (用在 Netflix 的 Open Connect):「Netflix 在單機服務 400Gbps 的影音流量」,快一年後的目前,Netflix 的人已經成功推到接近 800Gbps 了:「Serving Netflix Video Traffic at 800Gb/s and Beyond」。另外在 Hacker News 上的討論「Serving Netflix Video Traffic at 800Gb/s and Beyond [pdf] (nabstreamingsummit.com)」也可以看看。

翻了一下投影片,最後衝到 720Gbps,主要是因為 NIC output drop,而非其他部份。

裡面還是把之前的故事也都講了一遍 (不然簡報的時間會太短?),如果有看過前面的內容可以快速看一下就好,這次新的東西從 page 89 開始:

  • Asynchronous Sendfile (2014)
  • Kernel TLS (2016)
  • Network-centric NUMA (2019)
  • Inline Hardware (NIC) kTLS (2022)
  • 800G initial results

最後面幾張投影片裡面有提到往 800Gbps 衝的硬體平台:

  • AMD (EPYC 7713 CPUs)
  • Dell (PowerEdge R7525)
  • Mellanox/NVIDIA (ConnectX-6 Dx NICS)
  • Intel (P5316 NVME)

下個目標不知道是什麼,看起來目前已經壓榨到 memory bandwidth 也有點極限的感覺了...

TCP 標準被整理到 RFC 9293

看到「RFC 9293: Transmission Control Protocol (TCP)」這篇,主要是把本來分散在各個 RFC 的文件 (從 RFC 793 開始) 全部整理成一份,另外把一些已知的勘誤表放進來:

This document specifies the Transmission Control Protocol (TCP). TCP is an important transport-layer protocol in the Internet protocol stack, and it has continuously evolved over decades of use and growth of the Internet. Over this time, a number of changes have been made to TCP as it was specified in RFC 793, though these have only been documented in a piecemeal fashion. This document collects and brings those changes together with the protocol specification from RFC 793. This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093, 6429, 6528, and 6691 that updated parts of RFC 793. It updates RFCs 1011 and 1122, and it should be considered as a replacement for the portions of those documents dealing with TCP requirements. It also updates RFC 5961 by adding a small clarification in reset handling while in the SYN-RECEIVED state. The TCP header control bits from RFC 793 have also been updated based on RFC 3168.

然後淘汰掉 (obselete) 一卡車 RFC 文件 XD

翻資料發現 2014 的時候 HTTP/1.1 被幹過一次類似的事情,不過是反過來被拆開:「HTTP/1.1 的更新」,這次把 RFC 2616 幹掉分成 RFC 7230RFC 7235

然後今年因為 HTTP 的關係又被幹了一次,這次 HTTP/1.1 又被整回來變成一份文件,但是把裡面的一些概念拆開:「HTTP 標準的翻新」。

其中 RFC 9110 定義 HTTP Semantics,RFC 9111 定義 HTTP Caching,然後 RFC 9112RFC 9113 拿來定義了 HTTP/1.1 與 HTTP/2,另外先把 HTTP/3 的號碼保留下來的 RFC 9114

不斷 refactor 以及加新功能的文件...

CloudFront 支援 HTTP/3

雖然 HTTP/3 還沒有進到 Standard Track,但看到 CloudFront 宣佈支援 HTTP/3 了:「New – HTTP/3 Support for Amazon CloudFront」。

只要在 CloudFront 的 console 上勾選起來就可以了:

看了看 RFC 9114: HTTP/3 文件裡的描述,client 可以試著建立 UDP 版本的 QUIC 連線,但要有機制在失敗時回去用 TCPHTTP/2 或是 HTTP/1.1

A client MAY attempt access to a resource with an "https" URI by resolving the host identifier to an IP address, establishing a QUIC connection to that address on the indicated port (including validation of the server certificate as described above), and sending an HTTP/3 request message targeting the URI to the server over that secured connection. Unless some other mechanism is used to select HTTP/3, the token "h3" is used in the Application-Layer Protocol Negotiation (ALPN; see [RFC7301]) extension during the TLS handshake.

Connectivity problems (e.g., blocking UDP) can result in a failure to establish a QUIC connection; clients SHOULD attempt to use TCP-based versions of HTTP in this case.

另外一條路是在 TCP 連線時透過 HTTP header 告訴瀏覽器升級:

An HTTP origin can advertise the availability of an equivalent HTTP/3 endpoint via the Alt-Svc HTTP response header field or the HTTP/2 ALTSVC frame ([ALTSVC]) using the "h3" ALPN token.

像是這樣:

Alt-Svc: h3=":50781"

然後 client 就可以跑上 HTTP/3:

On receipt of an Alt-Svc record indicating HTTP/3 support, a client MAY attempt to establish a QUIC connection to the indicated host and port; if this connection is successful, the client can send HTTP requests using the mapping described in this document.

另外在 FAQ 裡面有提到啟用 HTTP/3 是不另外計費的,就照著本來的 request 費用算:

Q. Is there a separate charge for enabling HTTP/3?

No, there is no separate charge for enabling HTTP/3 on Amazon CloudFront distributions. HTTP/3 requests will be charged at the request pricing rates as per your pricing plan.

先開起來玩看看...

Linode 的 VPC 與 VLAN

看到 Linode 的「Go Private with VLANs and VPCs」這篇,裡面特地提到了 VLAN

Can a VLAN be used as a VPC on Linode?

The short answer is again, yes, we can use a VLAN as a VPC on Linode.

在「Common Use Cases for Linode's VLAN Service」這篇 (發現是 2021 年就有的資料) 裡面開頭就有提到:

This means that two or more Linodes connected via the VLAN can see each other as if they were directly connected to the same physical Ethernet network. This network supports all the logical Ethernet features like L2 broadcast and L2 multicast.

所以看起來的確是在 Linode 的 SDN 上面支援了 VLAN。記得早期是用 Cisco 的方案 (2013 年的時候有「Linode NextGen: The Network」這篇),現在不知道是什麼架構...

EC2-Classic 的狀態

Twitter 上看到這個:

2022/08/15 是 EC2-Classic 最後活著的時間,先前在「AWS 宣佈 EC2-Classic 退役的計畫」這篇也有提到。

在官方的公告文章「EC2-Classic Networking is Retiring – Here’s How to Prepare」這邊裡面有提到最後的日期:

On August 15, 2022 we expect all migrations to be complete, with no remaining EC2-Classic resources present in any AWS account.

不過畢竟是用 expect,應該還有客戶沒有換過去,而且 AWS 官方好像也沒有提到這個 milestone...