Home » Computer » Network » Archive by category "Service" (Page 21)

nginx 記錄 TLS 連線資訊

想要在 nginx 的 access log 裡面記錄使用者在 HTTPS 連線使用的 TLS protocol 與 cipher。

在「How can I let nginx log the used SSL/TLS protocol and ciphersuite?」這邊有提到方向是 $ssl_protocol$ssl_cipher (出自「Module ngx_http_ssl_module」內的 Embedded Variables 章節)。

他的方式是在前面就插入 protocol,但我希望前面的欄位保持不變,把 protocol & cipher 放到後面,所以我就加了一個 /etc/nginx/conf.d/combined_ssl.conf (這邊我用 ondrej 的 PPA,在設定檔裡會撈 /etc/nginx/conf.d/ 下的設定,不確定其他的情況如何):

#
log_format combined_ssl '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $ssl_protocol/$ssl_cipher';

然後本來用 combined 的 HTTPS 設定就改成 combined_ssl

來放一陣子再來分析,然後想看看要怎麼調整 cipher...

CityBikes 的網站 (以及 API)

看到 yllan 在 Twitter 上提到 CityBikes 這個站台:

把 YouBike 打進去搜尋後選台北市,可以看到是有紅色與綠色的泡泡會一直冒出來... 我猜是借車與還車?所以有提供這樣的 API 可以用?這看起來不錯耶...

用 Machine Learning 調校資料庫

AWS AI Blog 在月初上放出來的消息:「Tuning Your DBMS Automatically with Machine Learning」。

Carnegie Mellon Database Group 做的研究,除了預設值以外,另外跟四種不同的參數做比較,分別是 OtterTune (也就是這次的研究)、Tuning script (對於不熟資料庫的人,常用的 open source 工具)、DBA 手動調整,以及 RDS

MySQL

PostgreSQL

比較明顯的結論是:

  • Default 值在所有的 case 下都是最差的 (無論是 MySQL 與 PostgreSQL 平台,以及包括 99% 的 Latency 與 QPS,這樣二乘二的四個結果)。而且 Default 跑出來的數字與其他的差距都很明顯。
  • OtterTune 在所有 case 下跑出來都比 Tuning script 的好。這也是合理的結果,本來就是想要取代其他機器跑出來的結果。

至於有些討論 DBA 會失業的事情,我是樂見其成啦... 這些繁瑣的事情可以自動化就想交給自動化吧 XD

Pinboard 收購 Delicious

Pinboard 收購了 Delicious:「Pinboard Acquires Delicious」。

Maciej Cegłowski (Pinboard 的創辦人) 本來就是一個嘴巴很「(逼~)」的人,Pinboard 的官方 Twitter 帳號一直都是很有趣的內容 XDDD

這次公告的稿子裡面提到他為什麼買 Delicious 也是酸到爆炸 XDDD

Delicious has over a billion bookmarks and is a fascinating piece of web history. Even Yahoo, for whom mismanagement is usually effortless, had to work hard to keep Delicious down. I bought it in part so it wouldn’t disappear from the web.

然後最後面也是很傲氣的結尾 XDDD

In December of 2010, Yahoo announced it was ‘sunsetting’ Delicious, an adventure I wrote about at length. The site was sold to the YouTube founders in 2011. They subsequently sold it to Science, Inc. in 2014. Science sold it to Delicious Media in 2016, and last month Delicious Media sold it to me.

Do not attempt to compete with Pinboard.

Web Cache Deception Attack

在「How (Not) to Control Your CDN」這邊看到了「Web Cache Deception Attack」這個攻擊方式。

攻擊的手法是利用網站會把 /user/personal-info/foo.css/user/personal-info 視為一樣的內容時,配合 CDN 或是 reverse proxy server 會把 .css 設定無差異 cache 時,就可以在 cache server (cache edge) 取得使用者的敏感資料。

這主要是 url routing 的條件放太寬造成的。

另外 Mark Nottingham 還建議 cache 應該在 origin server 上控制,而非在 CDN 上設定。也就是說,在 origin server 上送出 Cache-Control,讓 CDN 或是 reverse proxy server 使用這個值來判斷 cache。

Reddit 的 Deploy 機制 (的歷史)

Reddit 主要是用 Python 寫的,這邊介紹了他們歷年來的 Code Deploy 系統:「The Evolution of Code Deploys at Reddit」。

最早期的時候 (2007 到 2010) 是用 rsync 更新程式碼,然後跑個迴圈用 ssh 連進去重跑:

# build the static files and put them on the static server
`make -C /home/reddit/reddit static`
`rsync /home/reddit/reddit/static public:/var/www/`

# iterate through the app servers and update their copy
# of the code, restarting once done.
foreach $h (@hostlist) {
    `git push $h:/home/reddit/reddit master`
    `ssh $h make -C /home/reddit/reddit`
    `ssh $h /bin/restart-reddit.sh`
}

2011 時因為人變多了,用 IRC 把過程丟出來 (okay,我知道你想問的問題... Slack 是 2013 年推出的):

The process for actually doing the deploy looked the same, but now the system did the work for you and told everyone what you were doing.

另外值得一提的是,因為他們不是自己架 IRC server 而是用外面第三方的伺服器,所以他們決定 IRC 只有單向告知的功能:

There was a lot of talk of systems that managed deploys from chat around this time, but since we used third party IRC servers we weren’t able to fully trust the chat room with production control and so it remained a one-way flow of information.

2012 時則是把機器列表放到 DNS 上,某種 service discovery 系統:

First, it fetched its list of hosts from DNS rather than keeping it hard-coded. This allowed us to update the list of hosts without having to remember to update the deploy tool as well — a rudimentary service discovery system.

另外是固定的版本,而非拉 master 下來,這樣可以避免 race condition 的不一致性 (推到一半有人把 code 塞進 master):

Another small but important change was to always deploy a fixed version of the code. The previous version of the tool would update master on a given host, but what if master changed mid-deploy because someone accidentally pushed up code? By deploying a specific git revision instead of branch name, we ensured that the deploy got the same version everywhere in production.

2013 往雲上搬,於是遇到像是「開新機器時剛好在 deploy 會拉到舊的 code」這種 edge case。

What happens if a server is launched while a deploy is ongoing? We had to make sure each newly launched server checked in to get new code if present. What about servers going away mid-deploy? The tool had to be made smarter to detect when the server was gone legitimately rather than there being an issue with the deploy process itself that should be noisily alerted on.

2014 遇到機器數量太多,推一輪要一個小時而被迫要平行化處理:

Over time, the number of servers needed to serve peak traffic grew. This meant that deploys took longer and longer. At its worst, a normal deploy took close to an hour. This was not good.

2015 則是加上 deploy lock,避免同時間有兩個人在 deploy:

Engineers would ask for the deploy lock and either get it or get put in the queue. This helped keep order in deploys and let people relax a bit while waiting for the lock.

2017 的部份則是提到了伺服器的數量:

This new mechanism allows us to deploy to a lot more machines concurrently, and deploy timings are down to 7 minutes for around 800 servers despite the extra waiting for safety.

看起來到現在還是維持手動 deploy,而不是自動化... 這塊還蠻有趣的 :o

Reddit 在處理 Page View 的方式

Reddit 說明了他們如何處理 pageview:「View Counting at Reddit」。

以 Reddit 的規模有提到兩個重點,第一個在善用 RedisHyperLogLog 這個資料結構,當量大的時候其實可以允許有微小的誤差:

The amount of memory varies per implementation, but in the case of this implementation, we could count over 1 million IDs using just 12 kilobytes of space, which would be 0.15% of the original space usage!

維基百科上有說明當資料量在 109 這個等級時,用 1.5KB 的記憶體只有 2% 的誤差值:

The HyperLogLog algorithm is able to estimate cardinalities of > 109 with a typical error rate of 2%, using 1.5 kB of memory.

第二個則是寫入允許短時間的誤差 (pageview 不會即時反應),透過批次處理降低對 Cassandra cluster 的負荷:

Writes to Cassandra are batched in 10-second groups per post in order to avoid overloading the cluster.

可以注意到把 Redis 當作 cache 層而非 storage 層。

主要原因應該跟 Redis 定位是 data structure server 而非 data structure storage 有關 (可以從對 Durability 的作法看出來),而使用 Cassandra 存 key-value 非常容易 scale,但讀取很慢。剛好兩個相輔相成。

GitHub Apps (前身 GitHub Integrations) 的 Rate Limiting 變得更彈性

GitHub 宣佈了把 GitHub Integrations 改名為 GitHub Apps,另外 Rate Limiting 變得更彈性:「GitHub Apps (formerly Integrations) General Release」。

All GitHub Apps start with a rate limit of 5000 requests per hour. To facilitate growth we have added a dynamic rate limit for installations: organization installations with more than 20 users receive another 50 requests per hour for each user. Installations that have more than 20 repositories receive another 50 requests per hour for each repository.

所以對於超過 20 個使用者以及超過 20 個 repository 的 organization 都會增加 Rate Limiting。每個單位增加 50 requests/hour 不算太多,不過想一下好像也不少... (尤其對大一點的團體來說)

GitHub 的整合性平台

GitHub 宣佈了 Marketplace,一個將各種外部 plugin 資訊整合起來的平台,讓開發者更容易尋找有哪些工具可以幫助開發:「Introducing GitHub Marketplace and more tools to customize your workflow」。

GitHub Marketplace is a new way to discover and purchase tools that extend your workflow. Find apps to use across your development process, from continuous integration to project management and code review.

可以看到目前分成五大類:

  • Code quality
  • Code review
  • Continuous integration
  • Monitoring
  • Project management

不過看了一下,目前上 Marketplace 的還有點少,不知道是不是對於要上架的公司有另外收費...

Archives