在 Ubuntu 下安裝 Tor Browser

因為 Tor Browser 會自己更新,所以不想裝 package 版本。網路上比較多人寫的是 torbrowser-launcher,但這幾天好像炸掉了,會一直說憑證有問題 (在 Issues 的頁面上可以看到一些回報),所以還是找了一下有沒有官方推薦的方法...

官方提供的是免安裝的執行檔案,結果在下載解開後,就看到了 start-tor-browser.desktop 這個檔案,如果打開來看,裡面就直接講到 --register-app 這個功能:

# After first invocation, it will update itself with the absolute path to the
# current TBB location, to support relocation of this .desktop file for GUI
# invocation. You can also add Tor Browser to your desktop's application menu
# by running './start-tor-browser.desktop --register-app'

他會自動把程式註冊到 ~/.local/share/applications/ 下,這樣用 Launcher 搜尋也找的到了,比想像中簡單不少...

cdnjs 轉移到 Cloudflare 負責維護

不確定是 cdnjs 還是 CDNJS,因為官方網站是小寫,但 GitHub 上是大寫...

Anyway,cdnjs 本來由社群維護更新 (實際上是透過 bot 更新,但 bot 本身也需要維護),因為人力時間的因素,轉移給 Cloudflare 負責了:「An Update on CDNJS」。

這次也更新了 cdnjs 的 daily request 數量,可以看到現在大約是每天六十億次:

本來 Cloudflare 是站在贊助頻寬的角色提供服務:

Within Cloudflare’s infrastructure there is a set of machines which are responsible for pulling the latest version of the repo periodically. Those machines then become the origin for cdnjs.cloudflare.com, with Cloudflare’s Global Load Balancer automatically handling failures. Cloudflare’s cache automatically stores copies of many of the projects making it possible for us to deliver them quickly from all 195 of our data centers.

但更新的 bot 本身掛了,而且維護者沒時間修:

Unfortunately approximately thirty days ago one of those bots stopped working, preventing updated projects from appearing in CDNJS. The bot's open-source maintainer was not able to invest the time necessary to keep the bot running. After several weeks we were asked by the community and the CDNJS founders to take over maintenance of the CDNJS repo itself.

所以現在則是 Cloudflare 接手維護了:

This means the Cloudflare engineering team is taking responsibility for keeping the contents of github.com/cdnjs/cdnjs up to date, in addition to ensuring it is correctly served on cdnjs.cloudflare.com.

不過裡面也提到了一個問題,就是現在瀏覽器為了安全性,對於不同的站台會有不同的 cache,本來 cdnjs 的設計目的之一被大幅削弱,現在只剩下省頻寬了:

The future value of CDNJS is now in doubt, as web browsers are beginning to use a separate cache for every website you visit. It is currently used on such a wide swath of the web, however, it is unlikely it will be disappearing any time soon.

Microbrowser 的影響力

在「Microbrowsers are Everywhere」這篇文章裡提到在討論 Microbrowser 的影響力。這邊講的 Microbrowser 是指在各種平台上面貼連結時產生的預覽資訊,像是 IM 裡面連結的預覽資訊 (左邊的圖):

這些預覽通常都不會執行 JavaScript,所以不會觸動頁面裡追蹤的服務,而一般的業主就會低估這個管道。但透過這個方式觸及到使用者的量以及影響力,作者覺得有可能比 Googlebot 還大:

3. Microbrowser are probably more important than google bot

算是另外一個領域的 optimization,畢竟不同的平台預覽的效果都不同...

改善內嵌 YouTube 影片的載入速度

YouTube 的 embed 會載入大量的元件,所以就有專案把對使用者沒有意義的元件都拔掉:「Lite YouTube Embed」。

從比較可以看出來 Lite YouTube Embed 下載的元件少很多:

當然在功能上有差異,不過基本的功能應該都沒問題...

雖然還是 JavaScript 實做,但可以看到實際的程式碼大概 40 行而已?(註解的行數大約是程式碼的兩倍):「lite-youtube-embed/src/lite-yt-embed.js」。

不過要注意的是,程式碼中用到 ES6 的 class 語法,所以如果要考慮到 IE11,應該是要打包轉換...

在 x86-64 上跑 Raspberry Pi 的 OS

看到「dockerpi」這個專案,讓你可以在 x86-64 上模擬 Raspberry Pi 環境跑 Raspbian

然後整包是先透過 Docker 產出一個獨立環境,然後裡面跑 QEMU 模擬 ARM 的環境,接下來再跑 Raspbian:

A full ARM environment is created by using Docker to bootstrap a QEMU virtual machine. The Docker QEMU process virtualises a machine with a single core ARM11 CPU and 256MB RAM, just like the Raspberry Pi. The official Raspbian image is mounted and booted along with a modified QEMU compatible kernel.

這馬上讓人想到 Inception 啊 XDDD

Anyway,這個方法對於想玩玩的人可以省不少功夫,是個有趣的專案就是了...

Amazon EC2 可以掛多個 Elastic Inference 了

看到 Jeff Barr 的 tweet:

所以是一台 Amazon EC2 的主機可以掛多個 Elastic Inference (GPU) 了,這主要應該還是對現有的使用者有幫助。還沒有使用的應該會往新的 AWS Inferentia 測試?(參考「AWS 開始推自己的 Machine Learning Chip」)

Webkit 的「反追蹤反追蹤」功能...

第一次看到標題的時候的確是 WTF 的感覺,愈來愈感覺到大戰的開始:「Preventing Tracking Prevention Tracking」。

在蘋果的平台上有 Intelligent Tracking Prevention (ITP) 功能,但先前這個功能比較簡單,所以還是有很多地方可以被當作 browser fingerprint 的一部份分析,所以蘋果決定改善,然後在新版的軟體裡引入:

This blog post covers enhancements to Intelligent Tracking Prevention (ITP) included in Safari on iOS and iPadOS 13.3, Safari 13.0.4 on macOS Catalina, Mojave, and High Sierra.

包括了跨站台時 Referer 的省略:

ITP now downgrades all cross-site request referrer headers to just the page’s origin. Previously, this was only done for cross-site requests to classified domains.

然後後面三個改善都跟 3rd-party cookie 有關,其中預設擋掉帶 cookie 的 3rd-party requests 應該會讓一些網站掛掉:

ITP will now block all third-party requests from seeing their cookies, regardless of the classification status of the third-party domain, unless the first-party website has already received user interaction.

早期自己做自家 SSO 的奇技淫巧中,會設計出透過 ajax 打多個不同的網域自動登入,看起來應該會需要檢查了...

CloudFront 的 access log 多了一些欄位

CloudFront 的 access log 多了一些欄位可以抓:「Amazon CloudFront now provides seven new data fields in access logs」。

官方這次加了七個欄位,看起來後面六個都還蠻有用的... (第一個 c-port 是 client 的 port,目前只想的到 debug 時可以拿出來看...)

而第二個的 time-to-first-byte 可以拿來分析效能,這是從 CloudFront 的角度來看的。

第三個的 x-edge-detailed-result-type 是錯誤時的處理,讓管理者可以從 access log 直接粗略分析。

剩下的四個都是跟 content type 與 length/range 有關,之前居然沒有嗎...

Backblaze 採購硬碟的策略

在「How Backblaze Buys Hard Drives」這篇裡面提到了 Backblaze 採購硬碟的策略,可以看到完全都是偏成本走向,所以裡面的策略一般個人用不太到,一般企業也不應該照抄,但拿來看看還蠻有趣的...

像是因為硬碟太多,所以硬碟的使用電量是他們在評估成本時蠻重要的一環,這點在一般的情境下不太會考慮到:

Power draw is a very important metric for us and the high speed enterprise drives are expensive in terms of power cost. We now total around 1.5 megawatts in power consumption in our centers, and I can tell you that every watt matters for reducing costs.

另外也提到了 SMR 硬碟的特性,在單位成本雖然有比較高的容量,但導致架構面需要配合 (cache),而也會有工程端的成本提昇,所以不是很愛:

SMR would give us a 10-15% capacity-to-dollar boost, but it also requires host-level management of sequential data writing. Additionally, the new archive type of drives require a flash-based caching layer. Both of these requirements would mean significant increases in engineering resources to support and thereby even more investment. So all-in-all, SMR isn’t cost-effective in our system.

成本面上,他們觀察到的現象是每季會降 5%~10%:

Ideally, I can achieve a 5-10% cost reduction per terabyte per quarter, which is a number based on historical price trends and our performance for the past 10 years.

另外提到了用 SAS controller 可以接多個 SATA 硬碟的事情 (雖然還是成本考量),但這塊也蠻有趣的:

Longer term, one thing we’re looking toward is phasing out SATA controller/port multiplier combo. This might be more technical than some of our readers want to go, but: SAS controllers are a more commonly used method in dense storage servers. Using SATA drives with SAS controllers can provide as much as a 2x improvement in system throughput vs SATA, which is important to me, even though serial ATA (SATA) port multipliers are slightly less expensive. When we started our Storage Pod construction, using SATA controller/port multiplier combo was a great way to keep costs down. But since then, the cost for using SAS controllers and backplanes has come down significantly.

企業內的文件搜尋系統 Amazon Kendra

AWS 推出了具有語意分析的能力,可以直接丟自然語言進去搜尋的 Amazon Kendra:「Announcing Amazon Kendra: Reinventing Enterprise Search with Machine Learning」。

之前 Google 有推出過 Google Search Appliance 也是做企業內資料的整合 (2016 年收掉了),但應該沒有到可以用自然語言搜尋?

Amazon Kendra 的費用不算便宜,Enterprise Edition 提供 150GB 的容量與 50 萬筆文件,然後提供大約 40k query/day,這樣要 USD$7/hr,一個月大約是 USD$5,040,不過對於企業來說應該是很有用...

另外有提到這邊 query 收費的部份是估算,會依照 query 問題的難易度而不同:

Actual queries per day will vary based on query complexity, which greatly varies from customer to customer. Less complex queries (e.g. “leave policy”) consume less resources to run, and more complex queries (e.g. “What’s the daily parking allowance in Seattle?”) consume more resources to run. The total number of queries you can run with your allocated resources will depend on your mix of queries. The max queries per day provided above is an estimate, assuming 80% less complex queries and 20% more complex queries.

這樣頗有趣的,感覺可以處理簡單的分析了?