Amazon S3 提供更高的存取量...

AWS 宣佈提高了 Amazon S3 的效能:「Amazon S3 Announces Increased Request Rate Performance」。

每個 S3 prefix 都可以到 5500 RPS read 與 3500 RPS write:

Amazon S3 now provides increased performance to support up to 3,500 requests per second to add data and 5,500 requests per second to retrieve data, which can save significant processing time for no additional charge. Each S3 prefix can support these request rates, making it simple to increase performance exponentially.

舊的資料可以看「Request Rate and Performance Considerations」這邊,裡面沒有明講速度,但有提到如果超過 800 RPS read 與 300 RPS write 的門檻,建議開 case:

However, if you expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, we recommend that you open a support case to prepare for the workload and avoid any temporary limits on your request rate.

不過如果有量的話,還是建議照著原來的 prefix 建議,打散處理會比較好,通常在前面的 CDN 通常可以跑簡單的 url rewrite 處理掉 (像是 CloudFront 自家或是 Cloudflare),像是把使用 unix timestamp (ms) 的 https://www.example.com/1531843366123.jpg 變成 https://www.example.com/6123/1531843366123.jpg,這樣可以讓 Amazon S3 的後端依照 prefix 打散 loading,避免當站愈來愈大的時候很難處理。

在 nginx 環境中把 Trac 裝到子目錄下的設法

以前的「Nginx + FastCGI + Trac」提到給的範例是把 Trac 裝在 / 下的方法。如果是裝在 /trac 或是其他路徑時就需要修改了。

一開始試著改會出現這樣的錯誤:

No handler matched request to /trac/report/7

然後研究調整後,發現 Trac 純粹是吃 FastCGI 給的參數去判斷要怎麼處理 url routing,在 trac.ini 內 url 相關參數主要還是用在其他地方... (像是信件通知時用的 url)

首先是把檔案簡化,這是 2016 年寫的:

    location / {
        auth_basic "trac realm";
        auth_basic_user_file /srv/domain.example.com/.htpasswd;

        include fastcgi.conf;
        fastcgi_param AUTH_USER $remote_user;
        fastcgi_param HTTPS on;
        fastcgi_param PATH_INFO $fastcgi_script_name;
        fastcgi_param REMOTE_USER $remote_user;
        fastcgi_param SCRIPT_NAME "";
        fastcgi_pass unix:/var/run/trac/trac.sock;
    }

發現現在 fastcgi.conf 內都會處理 HTTPS 了,所以拿掉 HTTPS 的處理,然後把 location 的判斷改用 regex 去抓 /trac 後的東西,所以先變成這樣:

    location ~ /trac(/.*) {
        auth_basic "trac realm";
        auth_basic_user_file /srv/domain.example.com/.htpasswd;

        include fastcgi.conf;
        fastcgi_param AUTH_USER $remote_user;
        fastcgi_param PATH_INFO $fastcgi_script_name;
        fastcgi_param REMOTE_USER $remote_user;
        fastcgi_param SCRIPT_NAME "";
        fastcgi_pass unix:/var/run/trac/trac.sock;
    }

最後是把 PATH_INFO 改傳 $1 (在 / 的情境下 $fastcgi_script_name 剛好就會是 routing 用的 PATH_INFO 資訊,所以當時直接拿來用),把 SCRIPT_NAME 改成 /trac

也就是跟 Trac 說基底在 /trac,後面的路徑才是你的 routing engine 要處理的東西,所以變成:

    location ~ /trac(/.*) {
        auth_basic "trac realm";
        auth_basic_user_file /srv/domain.example.com/.htpasswd;

        include fastcgi.conf;
        fastcgi_param AUTH_USER $remote_user;
        fastcgi_param PATH_INFO $1;
        fastcgi_param REMOTE_USER $remote_user;
        fastcgi_param SCRIPT_NAME "/trac";
        fastcgi_pass unix:/var/run/trac/trac.sock;
    }

沒寫下來就會花不少時間重新摸...

Facebook 與 Google Chrome 以及 Firefox 的人合作降低 Reload 使用的資源

Facebook 花了不少時間對付 reload 這件事情:「This browser tweak saved 60% of requests to Facebook」。

Facebook 的人發現有大量對靜態資源的 request 都是 304 (not modified) 回應:

In 2014 we found that 60% of requests for static resources resulted in a 304. Since content addressed URLs never change, this means there was an opportunity to optimize away 60% of static resource requests.

Google Chrome 很明顯偏高:

於是他們找出原因後,發現 Google Chrome 只要 POST 後的頁面都會 revalidate:

A piece of code in Chrome hinted at the answer to our question. This line of code listed a few reasons, including reload, for why Chrome might ask to revalidate resources on a page. For example, we found that Chrome would revalidate all resources on pages that were loaded from making a POST request.

然後在討論後認為這個行為不必要,就修掉了,可以看到降了非常多:

We worked with Chrome product managers and engineers and determined that this behavior was unique to Chrome and unnecessary. After fixing this, Chrome went from having 63% of its requests being conditional to 24% of them being conditional.

但還是很明顯比起其他瀏覽器偏高不少,在追問題後發現當輸入同樣的 url 時 (像是 Ctrl-L 或是 Cmd-L 然後直接按 enter),Google Chrome 會當作 reload:

The fact that the percentage of conditional requests from Chrome was still higher than other browsers seemed to indicate that we still had some opportunity here. We started looking into reloads and discovered that Chrome was treating same URL navigations as reloads while other browsers weren't.

不過這次推出修正後發現沒有大改變:(拿 production 測試 XDDD)

Chrome fixed the same URL behavior, but we didn't see a huge metric change. We began to discuss changing the behavior of the reload button with the Chrome team.

後來是針對 reload button 的行為修改,max-age 很長的就不 reload,比較短的就 reload。算是一種 workaround:

There was some debate about what to do, and we proposed a compromise where resources with a long max-age would never get revalidated, but that for resources with a shorter max-age the old behavior would apply. The Chrome team thought about this and decided to apply the change for all cached resources, not just the long-lived ones.

Google 也發了一篇說明這個新功能:「Reload, reloaded: faster and leaner page reloads」。

當 Facebook 的人找 Firefox 的人時,Firefox 決定另外定義哪些東西在 reload 時不需要 revalidate,而不像 Google Chrome 的 workaround:

Firefox chose to implement this directive in the form of a cache-control: immutable header.

Firefox 的人也寫了一篇「Using Immutable Caching To Speed Up The Web」解釋這個新功能。

所以之後規劃前後端的架構時又有東西要考慮進去...

CloudFront 支援將 Query String 內的特定 Key/Value 當作 Cache Key 的一部分

Amazon CloudFront 可以指定 query string 中的某個特定的 key/value 當做 cache key 的一部分了:「Announcing Query String Whitelisting for Amazon CloudFront」,對應的文件在「Configuring CloudFront to Cache Based on Query String Parameters」這邊可以查到。

先前只能針對選擇忽略掉整個 query string,或是把整個 query string 當作 cache key 的一部分,現在可以細部調整了。

最簡單的應用可以用在 css/js 的 asset 上,針對 v=\d+ 當作 cache key 的一部分,而其他的參數可以忽略,不過這好像沒什麼特別的意義。

目前想到比較有意義的應用是針對 dynamic content 多了一些籌碼可以用,像是 Slack 把整個網站放上 CloudFront 後,應該會有很多 API 是透過 query string 傳遞參數,而這次的改變讓 CloudFront 可以細部調整。

AWS 的 ALB (Application Load Balancer)

前幾天跟 AWS 的人開會的時候得知 ALB 的 beta program,今天就看到正式公開的消息了:「New – AWS Application Load Balancer」。

最主要的是對 WebSockets 與 HTTP/2 的支援,這個需求都喊很久了:

WebSocket allows you to set up long-standing TCP connections between your client and your server. This is a more efficient alternative to the old-school method which involved HTTP connections that were held open with a “heartbeat” for very long periods of time. WebSocket is great for mobile devices and can be used to deliver stock quotes, sports scores, and other dynamic data while minimizing power consumption. ALB provides native support for WebSocket via the ws:// and wss:// protocols.

HTTP/2 is a significant enhancement of the original HTTP 1.1 protocol. The newer protocol feature supports multiplexed requests across a single connection. This reduces network traffic, as does the binary nature of the protocol.

另外是 url routing,不過目前看起來只能設 10 條,我猜可以問問能不能加吧:

An Application Load Balancer has access to HTTP headers and allows you to route requests to different backend services accordingly. For example, you might want to send requests that include /api in the URL path to one group of servers (we call these target groups) and requests that include /mobile to another. Routing requests in this fashion allows you to build applications that are composed of multiple microservices that can run and be scaled independently.

As you will see in a moment, each Application Load Balancer allows you to define up to 10 URL-based rules to route requests to target groups. Over time, we plan to give you access to other routing methods.

再來是改善了之前抱怨很多的 health check:

Application Load Balancers can perform and report on health checks on a per-port basis. The health checks can specify a range of acceptable HTTP responses, and are accompanied by detailed error codes.

改進了不少東西...

Facebook 上貼的所有的連結都是公開的

tl;dr:Facebook 認為這個功能是 feature,不是 bug。

在「Why you shouldn’t share links on Facebook」這邊作者發現在 Facebook 上貼的「任何一個連結」都會產生 object id,而任何一個 object id 都可以直接取得 url,無論權限設定,像是這樣:

而 Facebook 認定這是 feature 而非 bug:

可以想像 NSA 之類的單位與地下組織開始狂掃...

Twitter 宣佈要放寬 140 字限制...

好像跟當初外面傳言的不太一樣... Anyway,Twitter 宣佈放寬 140 字限制:「Coming soon: express even more in 140 characters」。

這個限制的解除一直都有傳言,不過最後出來的結果跟預期的好像不太一樣,主要是三種用法將不計算在 140 字內。分別是 reply 時的 @username、貼圖貼影片時的 url、引用 tweet 時被引用的文字。

所以並不是完全放寬 140 字限制,只是把某些計算方式放寬...

HTTPS 因為安全性而不能使用 Referrer 的問題

Nuzzel 上看到老文章在討論 HTTPS 環境下因為安全性考量,而不能帶出 Referrer 的問題:「Where did all the HTTP referrers go?」。

原文中「Fixing Referrers in HTTPS: The Meta Referrer」這邊就有提到 HTML5 meta referrer,也就是 W3C 的「Referrer Policy」,問題是到現在還是 Draft 啊...

也因為過了三年,其實 draft 裡面多了不少參數可以用:

  • neverno-referrer 表示不傳。
  • origin 表示只傳 protocol + host 的部份,後面 path 的部份不要傳。
  • defaultno-referrer-when-downgrade 表示當 downgrade 時 (HTTP request 的部份) 不要傳。
  • origin-when-cross-origin 表示當跨站時用 origin 的邏輯,但本站還是用完整的路徑。
  • always 或是 unsafe-url 則是永遠都傳。

其中刪節號表示 W3C 不建議再使用,應該用後者比較新的。

不過因為在 Can I use 上面可以看到 Microsoft Edge 只支援舊的關鍵字 (也就是刪節號的那些),所以還是可以考慮先用舊的關鍵字,讓 Microsoft Edge 也可以被保護到:「Referrer Policy」。

把 HTTP 站台逐步換向 HTTPS 站台的步驟

Jerry Qu 寫的「关于启用 HTTPS 的一些经验分享」這篇文章講了要怎麼將 HTTP 站台逐步換成 HTTPS 站台的方式 (以及工具)。

一開始會遇到 Mixed Content,瀏覽器預設值不會直接全部擋掉,而是會放行圖片類資源 (但是出現對應的警告)。然後可以用 upgrade-insecure-requests 來幫助邊換,讓 url 裡指定 http 的自動連到 https。

當全站把 url 都修完後,接著就可以考慮用 HSTS 強制全上 HTTPS。

做到這邊的安全性已經到一定程度了,接下來要不要進 HSTS Preload List 就看大家自己的想法了。