Hacker News 上看到這篇「How They Bypass YouTube Video Download Throttling」在講 YouTube 防止下載的各種方式。
透過 API 拿到的 URL 直接抓很慢,大約 40-70KB/sec:
However, attempting to download from this URL leads to really slow download:
The speed is always limited to around 40-70kB/s.
這邊需要一個 javascript 環境計算出 n
,帶入後續的 request 以「證明」你是官方的網頁 client:
Since mid-2021, YouTube has included the query parameter
n
in the majority of file URLs. This parameter needs to be transformed using a JavaScript algorithm located in the filebase.js
, which is distributed with the web page. YouTube utilizes this parameter as a challenge to verify that the download originates from an “official” client. If the challenge is not resolved andn
is not transformed correctly, YouTube will silently apply throttling to the video download.The JavaScript algorithm is obfuscated and changes frequently, so it’s not practical to attempt reverse engineering to understand it. The solution is simply to download the JavaScript file, extract the algorithm code, and execute it by passing the
n
parameter to it. The following code accomplishes this.
但即使算出 n
,也還是會限速,可以看到作者策出來大約是 4MB/sec,雖然比以前快很多了,但還是看得出來有限速。這主要是避免 client 端過度 buffer 浪費頻寬:
With this new URL containing the correctly transformed n parameter, the next step is to download the video. However, YouTube still enforces a throttling rule. This rule imposes a variable download speed limit based on the size and length of the video, aiming to provide a download time that’s approximately half the duration of the video. This aligns with the streaming nature of videos. It would be a massive waste of bandwidth for YouTube to always provide the media file as quickly as possible.
接下來的方式就是利用 Range
拆成很多個 HTTP request 打,這樣因為 buffering algorithm 在開始限速前會先全速塞資料給你,就可以用這點避開限速的問題了。
把多的 request 與處理時間都算進去後,整體大約可以到 50-70MB/sec,算是可以接受的下載速度了:
However, the average speeds typically ranged between 50-70 MB/s or 400-560 Mb/s, which is still pretty fast.
後面有一些合併處理的指令 (因為 YouTube 會把影與音分離成兩個檔案),就不是重點了...