下載 YouTube 影片的技術限制與繞過方法

Hacker News 上看到這篇「How They Bypass YouTube Video Download Throttling」在講 YouTube 防止下載的各種方式。

透過 API 拿到的 URL 直接抓很慢,大約 40-70KB/sec:

However, attempting to download from this URL leads to really slow download:

The speed is always limited to around 40-70kB/s.

這邊需要一個 javascript 環境計算出 n,帶入後續的 request 以「證明」你是官方的網頁 client:

Since mid-2021, YouTube has included the query parameter n in the majority of file URLs. This parameter needs to be transformed using a JavaScript algorithm located in the file base.js, which is distributed with the web page. YouTube utilizes this parameter as a challenge to verify that the download originates from an “official” client. If the challenge is not resolved and n is not transformed correctly, YouTube will silently apply throttling to the video download.

The JavaScript algorithm is obfuscated and changes frequently, so it’s not practical to attempt reverse engineering to understand it. The solution is simply to download the JavaScript file, extract the algorithm code, and execute it by passing the n parameter to it. The following code accomplishes this.

但即使算出 n,也還是會限速,可以看到作者策出來大約是 4MB/sec,雖然比以前快很多了,但還是看得出來有限速。這主要是避免 client 端過度 buffer 浪費頻寬:

With this new URL containing the correctly transformed n parameter, the next step is to download the video. However, YouTube still enforces a throttling rule. This rule imposes a variable download speed limit based on the size and length of the video, aiming to provide a download time that’s approximately half the duration of the video. This aligns with the streaming nature of videos. It would be a massive waste of bandwidth for YouTube to always provide the media file as quickly as possible.

接下來的方式就是利用 Range 拆成很多個 HTTP request 打,這樣因為 buffering algorithm 在開始限速前會先全速塞資料給你,就可以用這點避開限速的問題了。

把多的 request 與處理時間都算進去後,整體大約可以到 50-70MB/sec,算是可以接受的下載速度了:

However, the average speeds typically ranged between 50-70 MB/s or 400-560 Mb/s, which is still pretty fast.

後面有一些合併處理的指令 (因為 YouTube 會把影與音分離成兩個檔案),就不是重點了...

從 Mozilla 官網下載的 Firefox 帶有追蹤用的標籤

前天看到「Each Firefox download has a unique identifier」這篇報導,就順手貼到 Hacker News 上面了:「Each Firefox download has a unique identifier (ghacks.net)」。

簡單的說就是 Mozilla 在 Firefox 的 binary 裡面加上 download token,後續就可以追蹤使用者:「[meta] Support download token」。

依照報導所提到的,每次下載 binary 都會有不同的 token:

在「Attached file dltoken_data_review.md — Details」裡面有回答更多細節,像是跟 Google Analytics 綁定:

5) List all proposed measurements and indicate the category of data collection for each measurement, using the [Firefox data collection categories](https://wiki.mozilla.org/Firefox/Data_Collection) found on the Mozilla wiki.   

<table>
  <tr>
    <td>Measurement Description</td>
    <td>Data Collection Category</td>
    <td>Tracking Bug #</td>
  </tr>
  <tr>
    <td>A download token that uniquely corresponds to a Google Analytics ID</td>
    <td>Category 4 "Highly sensitive or clearly identifiable personal data"</td>
    <td>Bug 1677497</td>
  </tr>
</table>

我自己重製不出來 (都是被導去 CloudFront),但留言區裡面的 Yuliya 透過 Tor 有重製出來:

I have tried some TOR exit nodes:

Name: Firefox Setup 98.0.1_germany.exe
Size: 55528896 bytes (52 MiB)
SHA256: 2d8164d547d8a0b02f2677c05e21a027dc625c0c1375fd34667b7d039746d400
SHA1: 71302acbee6895b84cf0dfae99050926f2db59ef

Name: Firefox Setup 98.0.1_austria.exe
Size: 55528896 bytes (52 MiB)
SHA256: a139a45dd5737ab981068ca2596b7fdfde15e5d4bc8541e0a2f07a65defd3e4e
SHA1: 28630a0aababa162ca9e7cbca51e50b76b9c3cff

I have labeled the file for the corresponding country of the exit node.

如果不願意換到 Chromium-based 的方案,目前在討論裡看到的替代方案是 LibreWolf,昨天裝起來後發現還行,應該也可以測試看看...

Google Chrome 要開始管制非 HTTP 的下載了

Google Chrome 前陣子宣佈了要淘汰透過 HTTP 下載檔案的計畫:「Protecting users from insecure downloads in Google Chrome」。

分成不同檔案類型的下載,可以看到不同類型檔案會在不同時間點被阻擋:

所以到時候 IE 的功能又多了一個?

JavaScript Framework 不可避免的成本

看到「The Baseline Costs of JavaScript Frameworks」這篇文章在研究目前主流 JavaScript Framework 無法避免的成本到底有多高。

文章的結論是目前常見的 JavaScript Framework 其實都很肥重,在網路速度不快的地方得花不少時間下載,在非旗艦的手機上會需要花不少時間處理 (parse & compile)。

這是 gzip 後的大小:

這是 parse & compile 的時間:

這是下載時間 (扣除 latency 與 TLS connection 建立時間):

並不是說不能用,但重點會在客群:

But it’s important to consider your audience. If you’re building for resource constrained devices — which you certainly are if your product targets a country like India — you could consider using a lighter framework such as Riot or Preact. Your users will thank you.

最後有建議如果只是要呈現資訊,不要用整套 JavaScript Framework,在有需要互動的地方另外寫就好了:

For websites that primarily display content, it’s more efficient and cost-effective to just send some server-rendered HTML down the wire. If there are areas of your website that require interactivity, you can always use JavaScript to build those specific parts.

Sandvine 對全球網路流量的分析,那兩個是怎麼上榜的...

看到「Netflix Dominates Internet Traffic Worldwide, BitTorrent Ranks Fifth」這篇報導了 Sandvine 對全球網路流量的分析,主要是這張:

大多數的應用都不算意外 (只是差在各地區的使用習慣),但 AMERICAS 的第十名 (XBOX LIVE UPDATE) 跟 EMEA 的第七名 (PLAYSTATION DOWNLOAD) 是怎麼一回事 XDDD

又一個 TCP BBR 的測試結果

TCP BBRGoogle 發表的 TCP congestion control 演算法,是一個純伺服器端就能夠改善 TCP 壅塞處理的機制。在 Linux Kernel 4.9 之後被納入了。

Spotify 有大量資料要傳到使用者端 (像是音檔),剛好是 TCP BBR 改善的對象之一,實際測試後得到了很不錯的改善數據:「Smoother Streaming with BBR」。

Spotify 公佈的資料沒有提到平台,所以先稍微了解一下他的音質,也就是「Audio settings」這篇。

在 Desktop 是 160kbps/320kbps Ogg (Standard/HQ)。在 Web Player 則是 128kbps/256kbps AAC (Standard/HQ)。

行動平台部份比較複雜,在 iOS 上是 96kbps/160kbps/256kbps Ogg (Normal/High/Extreme),另外有 Automatic 自動調整的設定。在 Android 平台則是 24kbps HE-AACv2 (Low) 與 96kbps/160kbps/320kbps Ogg (Normal/High/Very high) 以及 Automatic。

而最後 Chromecast 則是 128kbps/256kbps (Standard/Premium)。

測試時可以發現 shutter (指跟不上播放速度) 的情況降低了 6%~10%,而且下載速度增加了 5%~7% (對於慢速的裝置改善更多,10%~15%):

Taking daily averages, stutter decreased 6-10% for the BBR group. Bandwidth increased by 10-15% for the slower download cohorts, and by 5-7% for the median. There was no difference in latency between groups.

而各地區的差異也可以看出來改善很多:

另外他們在測試時,剛好遇到秘魯的機房連外發生問題,結果意外發現 BBR 還是可以穩定在這種網路環境下運作:

In Peru, the non-BBR group saw a 400-500% increase in stutter. In the BBR group, stutter only increased 30-50%.

In this scenario, the BBR group had 4x bandwidth for slower downloads (the 10th percentile), 2x higher median bandwidth, and 5x less stutter!

Ubuntu 18.04 上可以直接設定 BBR,在 Ubuntu 16.04 則可以參考「Ubuntu 16.04 用 speedtest-cli 測試 TCP BBR 效能」這篇的方式升級 kernel 後設定 BBR。

OnionShare:透過 Tor 的 Hidden Service 分享檔案

OnionShare 官網上的說明是這樣寫:

OnionShare is an open source tool that lets you securely and anonymously share a file of any size.

Wiki 的說明比較清楚,實際上是在本機開一個 Hidden Service,而使用者必須透過 Tor Browser 這類的服務下載,這樣可以達成提供者與下載者都彼此匿名:

OnionShare lets you securely and anonymously share files of any size. It works by starting a web server, making it accessible as a Tor onion service, and generating an unguessable URL to access and download the files. It doesn't require setting up a server on the internet somewhere or using a third party filesharing service. You host the file on your own computer and use a Tor onion service to make it temporarily accessible over the internet. The other user just needs to use Tor Browser to download the file from you.

用法的部份就更清楚了:

Open OnionShare, drag and drop files and folders you wish to share into it, and click Start Sharing. After a moment, it will show you a .onion URL such as http://asxmi4q6i7pajg2b.onion/egg-cain. This is the secret URL that can be used to download the file you're sharing.

如果下載者不是那麼在意匿名性的話,Tor2web 這類的服務應該也可以用...

感覺以現在提供的功能來看還是不夠便利,實驗性質居多 XD

PS4 下載速度很慢的原因

在「Why PS4 downloads are so slow」這篇作者花了不少力氣找出原因,發現 PS4 下載速度很慢是故意的... 另外討論了在什麼情況下會變慢,以及要怎麼避免的方式。

懶得看的人可以直接看 Conculsions 那段,主要的原因是 PS4 會因為背景程式而調整 TCP window size (就算背景程式在 idle 也會影響到下載的 TCP window size),進而影響速度:

If any applications are running, the PS4 appears to change the settings for PSN store downloads, artificially restricting their speed. Closing the other applications will remove the limit.

用 TCP window size 來調整速度也算是頗有「創意」的方法...

Anyway,遇到時的解決方法就是把所有在跑的程式都完整關掉,再下載就會正常多了...

超過三億筆的密碼 (Hash 過的)

Troy Hunt 放出三億筆 SHA1 hash 過的密碼讓大家研究:「Introducing 306 Million Freely Downloadable Pwned Passwords」。

他引用了 NIST 新的草案中對密碼的建議,阻擋已知外洩的密碼:

檔案可以在「I been pwned? Pwned Passwords」這邊下載。

Tails 3.0 出了,然後又開始提供 BitTorrent 下載了...

Tails 是個 Tor 的獨立環境,可以直接用 USB 開機或是透過虛擬機上線,避免受到其他干擾而洩漏資訊。

剛剛看到了 Tails 發佈 3.0 版的消息:「Tails 3.0 is out」,比較特別的是在下載頁面發現 BitTorrent 的下載方式又被放回去了。

前陣子本來在「BitTorrent 對 SHA-1 的改善計畫?」這邊有提到 Tails 的團隊因應 SHA-1 的問題,在討論是否要繼續提供 BitTorrent 的問題 (因為 BitTorrent 裡使用 SHA-1 做很多事情),當時的決議其實是暫時使用 BitTorrent 發佈:「Biterrant attack」。

不過後續的更新這樣寫,所以看起來暫時會先恢復 BitTorrent 下載的方式:

After reading this discussion, my current conclusion is that we've totally misunderstood the impact of the attack, and that the security of our bittorrent downloads is still good enough. So I propose we revert to what we did before the 2.12 release, i.e. ship Torrents, for the foreseeable future when 2nd pre-image attacks are not realistic yet.

回到 Tails 3.0 本身,其中比較大的改變是放棄了 32bits 的支援:

Tails 3.0 works on 64-bit computers only and not on 32-bit computers anymore.

然後以前要在開機進到進階選單才有的語言設定,現在變成預設就會提醒了:

也許該再來測試看看注音輸入法好不好用的問題了 XD