AWS 環境裡面提供 NTP Service 了 (Amazon Time Sync Service)

以前在 AWS 環境裡都要自己架設兩台可以連外的 NTP server,然後將內部機器指到這兩台上,現在可以用現成的了:「Keeping Time With Amazon Time Sync Service」。

服務放在 169.254.169.123

You can access the service via the link local 169.254.169.123 IP address. This means you don’t need to configure external internet access and the service can be securely accessed from within your private subnets.

然後也有提到 leap second 的解法,用的解法是 leap smearing:

Leap seconds are known to cause application errors and this can be a concern for many savvy developers and systems administrators. The 169.254.169.123 clock smooths out leap seconds some period of time (commonly called leap smearing) which makes it easy for your applications to deal with leap seconds.

先前 AWS 也有 leap time,但不包括 Amazon EC2 這些系統 (EC2 裡的時間是獨立的),不過還是可以看一下 AWS 處理 leap time 的方式,因為這次 NTP Service 就會拿去用了。

最近一次 leap time 是 2016 年底的「Look Before You Leap – December 31, 2016 Leap Second on AWS」,處理的方式跟 2015 年時的方法還是一樣:「Look Before You Leap – The Coming Leap Second and AWS (Updated)」。

類似於下圖左上角這張的變化:

然後全區開放,都可以用了:

This service is provided at no additional charge and is immediately available in all public AWS regions to all instances running in a VPC.

Cloudflare 因為閏秒炸掉...

Cloudflare 這次閏秒炸掉:「How and why the leap second affected Cloudflare DNS」,影響範圍包括了 DNS query 與 HTTP request:

At peak approximately 0.2% of DNS queries to Cloudflare were affected and less than 1% of all HTTP requests to Cloudflare encountered an error.

主要的原因在於 Gotime.Now() 不保證遞增:

RRDNS is written in Go and uses Go’s time.Now() function to get the time. Unfortunately, this function does not guarantee monotonicity. Go currently doesn’t offer a monotonic time source (see issue 12914 for discussion).

修正的方式是針對時間回朔時處理:

In this patch we allowed RRDNS to forget about current upstream performance, and let it normalize again if time skipped backwards.

應該是因為 Cloudflare 這段程式還沒遇過 leap second 造成的...

Google 的 time.google.com

看到這張圖在講不同 server (service) 如何處理今年的 leap second (UTC 的跨年,台灣時間早上八點),出自 leap smear 這邊:

在很早前就有 time.google.com 這個 domain,但是當時 Google 的人有跳出來說明這個服務不是公開服務 (當時),不保證這個服務的正確性與穩定性:「timeX.google.com provide non standard time」。

不過一個月前公佈出來的 Google Public NTP 服務算是把整個系統搞定了。

其中在 Configuring Clients 這邊直接推薦用 iburst 參數,不愧是家大業大的 Google XDDD:

When the server is unreachable and at each poll interval, send a burst of eight packets instead of the usual one. As long as the server is unreachable, the spacing between packets is about 16s to allow a modem call to complete. Once the server is reachable, the spacing between packets is about 2s. This is designed to speed the initial synchronization acquisition with the server command and s addresses and when ntpd is started with the -q option.

回到原來的 leap smear 的比較圖,可以看出 Google 對 leap second 的解法是往前十二小時與往後十二小時各拉緩衝時間來避開,有些是沒在管,另外有些有種來亂的感覺 XDDD

AWS 處理這次的 Leap Second...

2015/06/30 到 2015/07/01 會有 Leap Second (閏秒),而這次 AWS 處理的方法也公告出來了:「Look Before You Leap – The Coming Leap Second and AWS」。

EC2 由於每個 instance 有自己的 clock,所以需要自己實作:如果對閏秒不會有問題的系統就放著,而對閏秒敏感的系統可以自己跑 ntpd 之類的程式校正。

而其他 AWS 的系統會用 24 個小時攤平多出來的閏秒,從 2015/06/30 的中午 12:00:00 開始,到 2015/07/01 的中午 12:00:00 結束 (都是 UTC):

也就是說不會出現 23:59:60 這種時間,而在這 24 小時調整區間內的誤差最多 0.5 秒。