Tag Archives: routing

AWS Transit Gateway:不用再自己處理多個 AWS 帳號之間的 routing 了

AWS 推出 Transit Gateway,讓多個 AWS 帳號之間的 routing 問題總算不用自己處理了:「New – Use an AWS Transit Gateway to Simplify Your Network Architecture」。

這個問題常出現在組織架構大一點的情境下:為了讓各單位自己付自己單位的帳,所以大家都建立了自己的 AWS 帳號。但內部還是有互相連線的需求,雖然把 IP range 都有切開,但還是得搞定彼此之間的 routing...

之前 AWS 還做了「AWS Global Transit Network」這份說明,解釋要怎麼處理 routing,在說明裡官方當時還提供了不少在雲上面建立 router 的方案。像是其中一個方式是用 Cisco 的方案做:「AWS Solution – Transit VPC」、「Overview - Transit Network VPC (Cisco CSR)」。

而現在則是可以直接用 AWS 的服務解決了:

You can attach up to 5000 VPCs to each gateway and each attachment can handle up to 50 Gbits/second of bursty traffic. You can attach your AWS VPN connections to a Transit Gateway today, with Direct Connect planned for early 2019.

每個接點都要收租用費 (美國是 $0.05/hr,一個月大約 $36;日本是 $0.07/hr,一個月大約 $50.4),另外流量要收 USD$0.02/GB,其實就是本來的 in + out 的費用 (在 EC2 同區的流量是要收 in + out 的費用的)。

AWS 的 BYOIP 服務開放一般使用了...

先前提到的「AWS 提供自帶 IP 到 AWS 上的服務了...」只能在 us-west-2 上使用 (需要申請),現在則是開放一般使用了:「Announcing the general availability of Bring Your Own IP for Amazon Virtual Private Cloud」。

而且範圍也增加了,除了本來測試的區域 us-west-2,現在 us-east-1us-east-2 都可以用:

This feature is now publicly available in US East (N. Virginia), US East (Ohio) and US West (Oregon) AWS Regions.


There is no additional charge to use the BYOIP feature. Also, you don’t have to pay for Elastic IP addresses that you create from BYOIP address prefixes.

從文件上看起來目前只支援 IPv4,每段最少需要 /24,而且每個 region 最多五個 range,另外保留使用權 (如果 IP 網段之前有很多不良記錄時 AWS 可以拒絕)。

AWS 提供自帶 IP 到 AWS 上的服務了...

AWS 宣佈提供自帶 IP 到 AWS 上的服務了:「Announcing Bring Your Own IP for Amazon Virtual Private Cloud (Preview)」。

目前只在 us-west-2 有,另外需要申請:

Bring Your Own IP is available for preview in the US West (Oregon) region. You can request access to this feature by completing this request form.

不知道是不是直接放 routing 出來?如果是的話,照慣例 IPv4 應該是至少要 /24?從申請表格上看起來像是這樣沒錯:

IPv4 Prefix you want to onboard. You need a minimum of /24 ARIN registered prefix. The Net Type should either be Allocated or Assigned:

俄羅斯的 BGP traffic reroute...

前幾天 (12 號) BGPmon 發現有很多知名的網段被導去俄羅斯:「Popular Destinations rerouted to Russia」。

Early this morning (UTC) our systems detected a suspicious event where many prefixes for high profile destinations were being announced by an unused Russian Autonomous System.


Starting at 04:43 (UTC) 80 prefixes normally announced by organizations such Google, Apple, Facebook, Microsoft, Twitch, NTT Communications and Riot Games were now detected in the global BGP routing tables with an Origin AS of 39523 (DV-LINK-AS), out of Russia.

從圖中也可以看出來 AS39523 透過 AS31133 發出這些 routing,然後主要是透過 AS6939 (Hurricane Electric) 擴散:


Amazon Route 53 對地區的微調功能

Amazon Route 53 推出新功能,針對地區微調資源的比重:「Amazon Route 53 Traffic Flow Announces Support For Geoproximity Routing With Traffic Biasing」。


For example, suppose you have EC2 instances in the AWS US East (Ohio) region and in the US West (Oregon) region. When a user in Los Angeles browses to your website, geoproximity routing will route the DNS query to the EC2 instances in the US West (Oregon) region because it's closer geographically. If you want a larger portion of users in the middle of the United States to be routed to one region, you can specify a positive bias for that region, a negative bias for the other region, or both.

有點 CDN 的想法在裡面...

AWS 在 2018 年要開大阪區

看起來 AWS 的官方稿只有日文版的消息 (雖然老大 Werner VogelsTwitter 上有提到):「【新リージョン】2018年に大阪ローカルリージョンを開設予定」。


理論上是能夠離台灣近一點,不過 routing 應該都還是從東京進去?先看看消息就好...

AWS 的 ALB (Application Load Balancer)

前幾天跟 AWS 的人開會的時候得知 ALB 的 beta program,今天就看到正式公開的消息了:「New – AWS Application Load Balancer」。

最主要的是對 WebSockets 與 HTTP/2 的支援,這個需求都喊很久了:

WebSocket allows you to set up long-standing TCP connections between your client and your server. This is a more efficient alternative to the old-school method which involved HTTP connections that were held open with a “heartbeat” for very long periods of time. WebSocket is great for mobile devices and can be used to deliver stock quotes, sports scores, and other dynamic data while minimizing power consumption. ALB provides native support for WebSocket via the ws:// and wss:// protocols.

HTTP/2 is a significant enhancement of the original HTTP 1.1 protocol. The newer protocol feature supports multiplexed requests across a single connection. This reduces network traffic, as does the binary nature of the protocol.

另外是 url routing,不過目前看起來只能設 10 條,我猜可以問問能不能加吧:

An Application Load Balancer has access to HTTP headers and allows you to route requests to different backend services accordingly. For example, you might want to send requests that include /api in the URL path to one group of servers (we call these target groups) and requests that include /mobile to another. Routing requests in this fashion allows you to build applications that are composed of multiple microservices that can run and be scaled independently.

As you will see in a moment, each Application Load Balancer allows you to define up to 10 URL-based rules to route requests to target groups. Over time, we plan to give you access to other routing methods.

再來是改善了之前抱怨很多的 health check:

Application Load Balancers can perform and report on health checks on a per-port basis. The health checks can specify a range of acceptable HTTP responses, and are accompanied by detailed error codes.


前幾天大規模的 Routing-Hijacking 事件

前幾天 BGPmon 偵測到大規模的 routing-hijacking 事件:「Large hijack affects reachability of high traffic destinations」,範圍應該是最近最大的:

Our initial investigation shows that the scope of this incident is widespread and affected 576 Autonomous systems and 3431 prefixes. Amongst the networks affected are high traffic prefixes including those of Google, Amazon, Twitter, Apple, Akamai, Time Warner Cable Internet and more.

在「[outages] HTTP access to www.amazon.com is down for us」這邊也看得到當時有人發現問題。

LinkedIn 的工程師分析 TCP Anycast 技術的穩定性與效能

LinkedIn 的工程師測試了 TCP Anycast 技術的穩定性以及效能:「TCP over IP Anycast - Pipe dream or Reality?」。

由於 stateless 再加上一個封包就傳的完的情況下,Anycast 技術被用在 DNS 上已經很長一段時間了,目前大多數 CDN 業者也都有用 Anycast 技術加快 CDN 的回應速度。

但 TCP 因為 stateful,如果 router 上採用的方式有問題,那麼就會導致封包可能會送到不同節點,這會是個嚴重的問題。不過很早之前,幾乎所有的骨幹 router 都已經支援 flow-based load balancing policy:

Most routers now do a per-flow load balancing, meaning packets on a TCP connection are always sent over the same path, but even a small percentage of routers with per-packet load balancing can cause the website to be unreachable for users behind that router.

所以 LinkedIn 的人試著測試 TCP Anycast 技術的穩定性:

So, to validate the assumption that TCP over anycast in the modern internet is no longer a problem, we ran a few synthetic tests.

測試的方式是設定 web server,讓下載速度不快,然後設了好幾個點並且放出對應的 routing,用 Catchpoint 服務監控,如果不穩定的話,應該就會收到 RST 中斷連線:

We configured our U.S. PoPs to announce an anycast IP address and then configured multiple agents in Catchpoint, a synthetic monitoring service, to download an object from that IP address. Our web servers were configured to deliberately send the response back slowly, taking over a minute for the complete data transfer. If the internet was unstable for TCP over anycast, we would observe continuous or intermittent failures when downloading the object. We would also observe TCP RSTs at the PoPs.


But even after running these tests for a week, we did not notice any substantial instability problems! This gave us confidence to proceed further.

所以也因此可以看到 CacheFlyCloudFlare 兩家採用 TCP Anycast 技術:

[S]ome popular CDNs have also started using anycast for HTTP traffic.


Anycast 是基於 routing 而決定要怎麼走,目標是希望可以透過 routing 取得 latency 最低的點。但實務上會把成本考慮進去,有可能會走到比較遠的點。在測試中可以發現北美的部份 Anycast 表現的比 GeoIP 好,但離開北美就掉很多:

所以 LinkedIn 決定用「Regional Anycast」,先用 GeoIP 決定要丟到哪個洲,而每個洲共用一個 Anycast 位置,這個方法讓效能提昇不少,全球在分配時 sub-optimal 的比率從 31% 降到 10% (i.e. 沒有分配到最好的點的比率):

上面主要是讀 LinkedIn 文章的心得,後面就是感想了。

TCP Anycast 用 CDN 上其實是相當吃虧的技術,由於 routing 的掌控權不再自己手上,有很多重要的手段是沒辦法做到的。

首先是當對外流量已經滿載時,不能切換到其他機房的機器,這邊講的「對外流量」不是 CDN 本身而已,而是中途任何的線路滿載都算,像是 HiNet 對 CloudFlare 香港機房的情況就很明顯。

另外在被 DDoS 時,由於沒辦法導流,在被攻擊時幾乎只剩下 clean pipe 類的解法,而同時間其他用戶會因為流量大量流入機房而一起被波及到。GeoIP 的方式彈性就大很多。

當然,還是有可以列出來的好處。主要是對於需要有固定 IP 應用來說 (像是 firewall 設定需求),TCP Anycast 滿足了這點。


Hacking Team 的 BGP Routing Hijack

Hacking Team 的事情告訴我們,只能是能做的,都有人會包成 Total Solution 賣。

洩漏出來的資料說明了 Hacking Team 在 2013 年幹的 BGP Routing Hijack:「How Hacking Team Helped Italian Special Operations Group with BGP Routing Hijack」。

The Wikileaks document described how the Italian ROS reached out to Hacking Team to work together on recovering the VPS server that ran on In ROS terminology, the server was called “Anonymizer”. The emails also revealed that this server relays updates to another back end server called “Collector” from which ROS presumably recovers the targets’ data.


When we look at historical BGP data we can confirm that AS31034 (Aruba S.p.A) indeed started to announce the prefix starting on Friday, 16 Aug at 2013 07:32 UTC. The Wikileaks emails outline how ROS complained to Hacking Team that the IP was reachable only via Fastweb but not yet through Telecom Italia, concluding not all RCS clients were able to connect back to the server immediately, since the prefix was not seen globally. BGP data further confirms this per the visualization below.

這些主要的 ISP 分別是:

AS12874 Fastweb
AS6939 Hurricane Electric, Inc.
AS49605 Reteivo.IT
AS4589 Easynet
AS5396 MC-link Spa


這也證明了「鎖 IP」的方法其實還是很危險的。