Home » Computer » Network » Archive by category "DNS"

Amazon Route 53 的 Auto Naming API 可以指到 CNAME 位置了

Amazon Route 53 的 Auto Naming API 可以拿來跑 Service Discovery (參考先前的「用 Amazon Route 53 做 Service Discovery」這篇),當時是 A/AAAA/SRV record,現在則可以註冊 CNAME 了:「Amazon Route 53 Auto Naming Announces Support for CNAME Record Type and Alias to ELB」。

最直接的影響就是 ELB 的部份了,透過 ELB 處理前端的話,覆載平衡以及數量限制的問題就會減輕很多 (之前是靠 Round-robin DNS 打散,而且限制一次最多回應五個 record):

Beginning today, you can use the Amazon Route 53 Auto Naming APIs to create CNAME records when you register instances of your microservices, and your microservices can discover the CNAMEs by querying DNS for the service name. Additionally, you can use the Amazon Route 53 Auto Naming APIs to create Route 53 alias records that route traffic to Amazon Elastic Load Balancers (ELBs).

用 Amazon Route 53 做 Service Discovery

Amazon Route 53 的新功能,可以解決以前自己要建立 Service Discovery 服務的工作:「Amazon Route 53 Releases Auto Naming API for Service Name Management and Discovery」。官方的文件在「Using Autonaming for Service Discovery」這邊。

不過目前有些限制,一個 namespace (domain name) 目前只能有五個服務:

DNS settings for up to five records.

然後 DNS 回應時,最多回八個 record:

When Amazon Route 53 receives a DNS query for the name of an instance, such as backend.example.com, it responds with up to eight IP addresses (for A or AAAA records) or up to eight SRV record values.

回應八個 record,但應該是可以註冊超過八個吧... (i.e. 每次都回不一樣)

自建服務 (像是 Cassandra 或是 ScyllaDB) 可以直接用這個服務掛上去,就不用自己架 Consul 了。

目前支援了這四區,亞洲不在這波提供範圍:

Amazon Route 53 Auto Naming is available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) regions.

LinkedIn 忘記續約導致 SSL Certificate 過期

Netcraft 上看到 LinkedIn 出包的消息,這次是 country-mixed 的版本出包:「LinkedIn certificate blunder leaves users LockedOut!」。

在 DNS 上也可以看出來這兩個 CNAME 到不一樣的 load balancer 上:

;; ANSWER SECTION:
www.linkedin.com.       260     IN      CNAME   2-01-2c3e-003c.cdx.cedexis.net.
2-01-2c3e-003c.cdx.cedexis.net. 93 IN   CNAME   pop-ehk1.www.linkedin.com.
pop-ehk1.www.linkedin.com. 3560 IN      A       144.2.3.1
;; ANSWER SECTION:
de.linkedin.com.        86400   IN      CNAME   cctld.linkedin.com.
cctld.linkedin.com.     86400   IN      CNAME   mix.linkedin.com.
mix.linkedin.com.       213     IN      CNAME   pop-ehk1.mix.linkedin.com.
pop-ehk1.mix.linkedin.com. 3546 IN      A       144.2.3.5

SSL Labs 上也看得出來在 Alternative names 的地方是不一樣的:「SSL Server Test: www.linkedin.com (Powered by Qualys SSL Labs)」、「SSL Server Test: de.linkedin.com (Powered by Qualys SSL Labs)」。

然後因為 LinkedIn 有設定 HSTS,所以使用者在界面上完全無法登入:

Google Chrome 上可以用 badidea 繞過 (參考「在 Google Chrome 連上因 HSTS 而無法連線的網站」),但在 Mozilla Firefox 上的話目前沒找到方法可以在界面上 bypass,而是需要改 SiteSecurityServiceState.txt 這個檔案:「HTTP Strict Transport Security prevents me from accessing a server that I'm doing development on」。

不過也因為兩個 cluster 獨立運作,網址改一下應該就會動了...

這幾年比較很少看到大公司出這種包,還蠻有趣的 XD

AWS 推出 Amazon GuardDuty 進行內部網路監控

AWS 推出 Amazon GuardDuty 監控內部網路:「Amazon GuardDuty – Continuous Security Monitoring & Threat Detection」。

從示意圖可以看到結合了許多 log 資料,然後綜合判斷:

In combination with information gleaned from your VPC Flow Logs, AWS CloudTrail Event Logs, and DNS logs, this allows GuardDuty to detect many different types of dangerous and mischievous behavior including probes for known vulnerabilities, port scans and probes, and access from unusual locations.

所以連 Bitcoin 相關網站也當作條件之一 XD

開了相當多區 (相較於之前 AWS Elemental MediaOOXX 系列...):

Amazon GuardDuty is available in production form in the US East (Northern Virginia), US East (Ohio), US West (Oregon), US West (Northern California), EU (Ireland), EU (Frankfurt), EU (London), South America (São Paulo), Canada (Central), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Mumbai) Regions and you can start using it today!

PChome 修正了問題,以及 RFC 4074 的說明

早些時候測試發現 PChome 已經修正了之前提到的問題:「PChome 24h 連線會慢的原因...」、「PChome 24h 連線會慢的原因... (續篇)」,這邊除了整理一下以外,也要修正之前文章裡的錯誤。

在 RFC 4074 (Common Misbehavior Against DNS Queries for IPv6 Addresses) 裡面提到了當你只有 IPv4 address 時,DNS server 要怎麼回應的問題。

在「3. Expected Behavior」說明了正確的作法,當只有 A RR 沒有 AAAA RR 的時候,應該要傳回 NOERROR,而 answer section 裡面不要放東西:

Suppose that an authoritative server has an A RR but has no AAAA RR for a host name. Then, the server should return a response to a query for an AAAA RR of the name with the response code (RCODE) being 0 (indicating no error) and with an empty answer section (see Sections 4.3.2 and 6.2.4 of [1]). Such a response indicates that there is at least one RR of a different type than AAAA for the queried name, and the stub resolver can then look for A RRs.

在「4.2. Return "Name Error"」裡提到,如果傳回 NXDOMAIN (3),表示查詢的這個名稱完全沒有 RR,而不僅僅限於 AAAA record,這就是我犯的錯誤 (在前面的文章建議傳回 NXDOMAIN):

This type of server returns a response with RCODE 3 ("Name Error") to a query for an AAAA RR, indicating that it does not have any RRs of any type for the queried name.

With this response, the stub resolver may immediately give up and never fall back. Even if the resolver retries with a query for an A RR, the negative response for the name has been cached in the caching server, and the caching server will simply return the negative response. As a result, the stub resolver considers this to be a fatal error in name resolution.

Several examples of this behavior are known to the authors. As of this writing, all have been fixed.

PChome 這次的修正回應了正確的值 (而不是我提到的 NXDOMAIN):

$ dig shopping.gs1.pchome.com.tw aaaa @ns1.gs1.pchome.com.tw

; <<>> DiG 9.9.5-3ubuntu0.16-Ubuntu <<>> shopping.gs1.pchome.com.tw aaaa @ns1.gs1.pchome.com.tw
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<<<- opcode: QUERY, status: NOERROR, id: 40767
;; flags: qr aa rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1280
;; QUESTION SECTION:
;shopping.gs1.pchome.com.tw.    IN      AAAA

;; AUTHORITY SECTION:
gs1.pchome.com.tw.      5       IN      SOA     ns1.gs1.pchome.com.tw. root.dns.pchome.com.tw. 20171123 3600 3 3600 5

;; Query time: 16 msec
;; SERVER: 210.242.216.91#53(210.242.216.91)
;; WHEN: Fri Nov 24 01:44:52 CST 2017
;; MSG SIZE  rcvd: 134

另外 RFC 也有一些其他的文件可以參考,像是 RFC 2308 (Negative Caching of DNS Queries (DNS NCACHE))、RFC 4697 (Observed DNS Resolution Misbehavior) 以及 RFC 8020 (NXDOMAIN: There Really Is Nothing Underneath),這些文件描述了蠻多常見的問題以及正確的處理方法,讀完對於現在愈來愈複雜的 DNS 架構有不少幫助。

PChome 24h 連線會慢的原因... (續篇)

上一篇「PChome 24h 連線會慢的原因...」寫到 DNS resolver 會倒在路邊,但沒寫會怎麼倒... 因為規格書上沒有寫當問不到要問的東西時要怎麼處理,所以每一家處理的方式都不太一樣。

我把對各 DNS resolver 查詢 100 次的結果放在 GitHub Gist 上:「Query 24h.pchome.com.tw」,大家都是回 SERVFAIL,只是時間不一樣 (最後一個 x.xxxx total 的部份表示實際秒數,wall clock)。

先看這次的主角好了,HiNet168.95.1.1168.95.192.1,同時也應該是 PChome 24h 服務使用人數最多的 DNS resolver。

這兩個 DNS resolver 在遇到問題時不會馬上回 SERVFAIL,加上業界有小道消息說中華自己改了不少 code,所以跟一般的 open source software 行為不太一樣。由於看不到 PChome 端的 DNS packet,所以只能就行為來猜... 應該是在第一輪都查不到後,會先 random sleep 一段時間,然後再去問一次,如果第二次還是失敗的話才回應 SERVFAIL

這個 random sleep 看起來可能是 10 秒,因為數據上看起來最長的時間就是這個了。

SEEDNet 的 139.175.1.1 以及 Google8.8.8.8 都沒這個問題,都會馬上回應 SERVFAIL

前陣子新出的 9.9.9.9 (參考「新的 DNS Resolver:9.9.9.9」) 則是有些特別的狀況,可以看到前面有三個 query 很慢 (第 2、3、5 三行),但後面的速度就正常了。可能是新加坡那邊有三台伺服器在服務 (目前我這邊測試的機器到 9.9.9.9 會到新加坡),在第一次遇到都沒有答案時會有特殊的演算法先確認,之後就會 cache 住?

所以各家 DNS resolver 反應都不太一樣,然後最大那家有問題 XD

24h.pchome.com.tw 慢一次,ecvip.pchome.com.tw 再慢一次,圖片的 a.ecimg.tw 再慢一次,一個頁面上多來幾個 domain 就會讓人受不了了 XD

其實我只要改成 8.8.8.8 或是改走 proxy.hinet.net 就可以解決啦,但還是寫下來吧 (抓頭)。

Happy Eyeballs (RFC 6555)

在「PChome 24h 連線會慢的原因...」這篇的 comment 有讀者提到了 Happy Eyeballs 應該可以解決這個問題:

除了可以在維基百科上面看到外,比較正式的說明可以參考 RFC 6555:「Happy Eyeballs: Success with Dual-Stack Hosts」,其中在「6. Example Algorithm」就有提到 Google ChromeMozilla Firefox 怎麼實做 Happy Eyeballs:

What follows is the algorithm implemented in Google Chrome and Mozilla Firefox.

  1. Call getaddinfo(), which returns a list of IP addresses sorted by the host's address preference policy.
  2. Initiate a connection attempt with the first address in that list (e.g., IPv6).
  3. If that connection does not complete within a short period of time (Firefox and Chrome use 300 ms), initiate a connection attempt with the first address belonging to the other address family (e.g., IPv4).
  4. The first connection that is established is used. The other connection is discarded.

If an algorithm were to cache connection success/failure, the caching would occur after step 4 determined which connection was successful.

Other example algorithms include [Perreault] and [Andrews].

可以看到 Happy Eyeballs 的演算法是要避免 IPv6 network 不通的情況卡住很慢 (如果在 300ms 內連線沒有建起來,就會儘快往另外一個 address family 嘗試),而不是在 DNS 層避免問題 (也就是 getaddinfo() 觸發的 DNS query)。

這次的情況是 DNS query 很慢,就會導致還是一開始就很慢,Happy Eyeballs 沒辦法解決這個問題。

不過話說回來,我是有印象知道有這個演算法,但不知道有「Happy Eyeballs」這個這麼逗趣的名字... (掩面)

AWS Certificate Manager 支援 DNS 認證的方式頒發憑證了

AWS Certificate Manager (ACM) 可以用 DNS 的方式認證了:「Easier Certificate Validation Using DNS with AWS Certificate Manager」。

不過這不同於 Let's Encrypt 在推的 dns-01,而是 AWS 自己定義的方式:

然後文章後面又再次提醒了一次,如果要用在 Amazon CloudFront,必須使用 us-east-1 的 ACM:

Note that your certificate must be in the US East (N. Virginia) Region to use the certificate with CloudFront.

這個方式讓自動化更容易做... 以前需要收信確認,雖然也是可以寫程式硬幹,但就得冒著 AWS 改版造成程式失效的風險。

PChome 24h 連線會慢的原因...

Update:續篇請參考「PChome 24h 連線會慢的原因... (續篇)」。

tl;dr:因為他們的 DNS servers 不會對 IPv6 的 AAAA record 正確的回應 NXDOMAIN,導致 DNS resolver 會不斷嘗試。

好像一行就把原因講完了啊,還是多寫一些細節好了。

起因於我的電腦連 PChome 24h 時常常會卡住,Google Chrome 會寫「Resolving host...」,於是就花了些時間找這個問題。

一開始先用幾個工具測試,發現 host 會卡,但不知道卡什麼:

$ host 24h.pchome.com.tw

tcpdump 出來聽的時候發現 host 會跑 AAAAA 以及 MX 三個種類,而後面兩個都會卡住:

24h.pchome.com.tw is an alias for shopping.gs1.pchome.com.tw.
shopping.gs1.pchome.com.tw has address 210.242.43.53
;; connection timed out; no servers could be reached
;; connection timed out; no servers could be reached

這樣就有方向了... 我的電腦是 Dual-stack network (同時有 IPv4 address 與 IPv6 address),所以可以預期 Google Chrome 會去查 IPv6 address。而國內很多網站都還沒有把有 IPv6 的情境當標準測試,很容易中獎...

有了方向後,用 dig 測試 IPv6 的 AAAA,發現都是給 SERVFAIL,而且多跑幾次就發現會卡住:

$ dig 24h.pchome.com.tw aaaa @168.95.192.1

然後對 {cheetah,dns,dns2,dns3,wolf}.pchome.com.tw (上層登記的) 與 dns4.pchome.com.tw (實際多的) 測,可以拿到 CNAME record,像是這樣:

$ dig 24h.pchome.com.tw aaaa @dns.pchome.com.tw

; <<>> DiG 9.9.5-3ubuntu0.16-Ubuntu <<>> 24h.pchome.com.tw aaaa @dns.pchome.com.tw
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26037
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 5, ADDITIONAL: 6
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;24h.pchome.com.tw.             IN      AAAA

;; ANSWER SECTION:
24h.pchome.com.tw.      300     IN      CNAME   shopping.gs1.pchome.com.tw.

;; AUTHORITY SECTION:
gs1.pchome.com.tw.      300     IN      NS      ns3.gs1.pchome.com.tw.
gs1.pchome.com.tw.      300     IN      NS      ns1.gs1.pchome.com.tw.
gs1.pchome.com.tw.      300     IN      NS      ns4.gs1.pchome.com.tw.
gs1.pchome.com.tw.      300     IN      NS      ns5.gs1.pchome.com.tw.
gs1.pchome.com.tw.      300     IN      NS      ns2.gs1.pchome.com.tw.

;; ADDITIONAL SECTION:
ns1.gs1.pchome.com.tw.  300     IN      A       210.242.216.91
ns2.gs1.pchome.com.tw.  300     IN      A       210.242.216.92
ns3.gs1.pchome.com.tw.  300     IN      A       210.242.43.93
ns4.gs1.pchome.com.tw.  300     IN      A       203.69.38.91
ns5.gs1.pchome.com.tw.  300     IN      A       210.71.147.91

;; Query time: 12 msec
;; SERVER: 210.59.230.85#53(210.59.230.85)
;; WHEN: Wed Nov 22 11:05:24 CST 2017
;; MSG SIZE  rcvd: 243

但往 ns{1,2,3,4,5}.gs1.pchome.com.tw 問的時候給不出答案,也不給 NXDOMAIN,像是這樣:

$ dig shopping.gs1.pchome.com.tw aaaa @ns1.gs1.pchome.com.tw

; <<>> DiG 9.9.5-3ubuntu0.16-Ubuntu <<>> shopping.gs1.pchome.com.tw aaaa @ns1.gs1.pchome.com.tw
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36249
;; flags: qr rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 5, ADDITIONAL: 6
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1280
;; QUESTION SECTION:
;shopping.gs1.pchome.com.tw.    IN      AAAA

;; AUTHORITY SECTION:
gs1.pchome.com.tw.      3600    IN      NS      ns3.gs1.pchome.com.tw.
gs1.pchome.com.tw.      3600    IN      NS      ns4.gs1.pchome.com.tw.
gs1.pchome.com.tw.      3600    IN      NS      ns5.gs1.pchome.com.tw.
gs1.pchome.com.tw.      3600    IN      NS      ns1.gs1.pchome.com.tw.
gs1.pchome.com.tw.      3600    IN      NS      ns2.gs1.pchome.com.tw.

;; ADDITIONAL SECTION:
ns3.gs1.pchome.com.tw.  3600    IN      A       210.242.43.93
ns4.gs1.pchome.com.tw.  3600    IN      A       203.69.38.91
ns5.gs1.pchome.com.tw.  3600    IN      A       210.71.147.91
ns1.gs1.pchome.com.tw.  3600    IN      A       210.242.216.91
ns2.gs1.pchome.com.tw.  3600    IN      A       210.242.216.92

;; Query time: 11 msec
;; SERVER: 210.242.216.91#53(210.242.216.91)
;; WHEN: Wed Nov 22 11:07:17 CST 2017
;; MSG SIZE  rcvd: 310

於是 DNS resolver 就倒在路邊了...

Archives