利用 Cloudflare Workers 繞過 Cloudflare 自家的阻擋機制

Hacker News 首頁上看到「How to bypass Cloudflare bot protection (jychp.medium.com)」這則,裡面的文章是「How to bypass CloudFlare bot protection ?」這篇,利用 Cloudflare Workers 繞過 Cloudflare 自家的 CAPTCHA 機制。

這個漏洞有先被送給 Cloudflare,但被認為不是問題,所以作者就決定公開:

Several months ago I submitted what appeared to be a security flaw to CloudFalre’s bugbounty program. According to them, this is not a problem, it’s up to you to make up your own mind.

技術上就是透過 Cloudflare Workers 當作 proxy server,只是看起來 Cloudflare 對自家 IP 有特別處理,在設定妥當後,用 Cloudflare Workers 的 IP address 去連 Cloudflare 的站台,幾乎不會觸發 Cloudflare 的阻擋機制。

不過 free tier 還是有限制,主要就是數量:

The first 100,000 requests each day are free and paid plans start at just $5/10 million requests, making Workers as much as ten-times less expensive than other serverless platforms.

作者也有提到這點:

So let’s enjoy the 100 000 request/day for your free Cloudflare account and go scrape the world !

但這是個有趣的方法,加上信用卡盜刷之類的方式,這整包看起來就很有威力...

蘋果也搞了個 Applebot 掃資料

Hacker News Daily 上翻到的:「About Applebot」,另外 Hacker News 上的討論也蠻有趣的,可以看看:「Applebot (support.apple.com)」。

目前的用途是用在 Siri 之類的 bot:

Applebot is the web crawler for Apple. Products like Siri and Spotlight Suggestions use Applebot.

裡面有提到辨識方式,IP 會使用 17.0.0.0/8,反解會是 *.applebot.apple.com

Traffic coming from Applebot is identified by its user agent, and reverse DNS shows it in the *.applebot.apple.com domain, originating from the 17.0.0.0 net block.

另外 User-agent 也可以看出:

Mozilla/5.0 (Device; OS_version) AppleWebKit/WebKit_version (KHTML, like Gecko) Version/Safari_version Safari/WebKit_version (Applebot/Applebot_version)

後面有提到 search engine 的部份 (About search rankings),這點讓大家在猜蘋果是不是開始在弄 search engine 了,在 Hacker News 上的討論串裡面可以看到不少對於蘋果自己搞 search engine 的猜測。

然後也有些頗有趣的,像是爆料當初開發的過程遇到的問題 XD

jd20 3 days ago [–]

Some fun facts:
- Applebot was originally written in Go (and uncovered a user agent bug on redirects, revealing it's Go origins to the world, which Russ Cox fixed the next day).

- Up until the release of iOS 9, Applebot ran entirely on four Mac Pro's in an office. Those four Mac Pro's could crawl close to 1B web pages a day.

- In it's first week of existence, it nearly took Apple's internal DNS servers offline. It was then modified to do it's own DNS resolution and caching, fond memories...

Source: I worked on the original version.

Cloudflare 改用自己的 CAPTCHA 服務 hCaptcha

CloudflareGooglereCAPTCHA 改用自家的 hCaptcha:「Moving from reCAPTCHA to hCaptcha」。

看起來其實就是錢的問題,reCAPTCHA 要收費了,而以 Cloudflare 的量會太貴:

Earlier this year, Google informed us that they were going to begin charging for reCAPTCHA. That is entirely within their right. Cloudflare, given our volume, no doubt imposed significant costs on the reCAPTCHA service, even for Google.

另外 hCaptcha 有提供免費版本給一般網站用,剛出來這幾天等白老鼠寫心得後,再決定要不要跳進去測試...

cdnjs 轉移到 Cloudflare 負責維護

不確定是 cdnjs 還是 CDNJS,因為官方網站是小寫,但 GitHub 上是大寫...

Anyway,cdnjs 本來由社群維護更新 (實際上是透過 bot 更新,但 bot 本身也需要維護),因為人力時間的因素,轉移給 Cloudflare 負責了:「An Update on CDNJS」。

這次也更新了 cdnjs 的 daily request 數量,可以看到現在大約是每天六十億次:

本來 Cloudflare 是站在贊助頻寬的角色提供服務:

Within Cloudflare’s infrastructure there is a set of machines which are responsible for pulling the latest version of the repo periodically. Those machines then become the origin for cdnjs.cloudflare.com, with Cloudflare’s Global Load Balancer automatically handling failures. Cloudflare’s cache automatically stores copies of many of the projects making it possible for us to deliver them quickly from all 195 of our data centers.

但更新的 bot 本身掛了,而且維護者沒時間修:

Unfortunately approximately thirty days ago one of those bots stopped working, preventing updated projects from appearing in CDNJS. The bot's open-source maintainer was not able to invest the time necessary to keep the bot running. After several weeks we were asked by the community and the CDNJS founders to take over maintenance of the CDNJS repo itself.

所以現在則是 Cloudflare 接手維護了:

This means the Cloudflare engineering team is taking responsibility for keeping the contents of github.com/cdnjs/cdnjs up to date, in addition to ensuring it is correctly served on cdnjs.cloudflare.com.

不過裡面也提到了一個問題,就是現在瀏覽器為了安全性,對於不同的站台會有不同的 cache,本來 cdnjs 的設計目的之一被大幅削弱,現在只剩下省頻寬了:

The future value of CDNJS is now in doubt, as web browsers are beginning to use a separate cache for every website you visit. It is currently used on such a wide swath of the web, however, it is unlikely it will be disappearing any time soon.

用 PoW 當作防機器人的方式

看到「wehatecaptchas」這個服務試著用 PoW (Proof of work) 擋機器人...

這個方式不需要像 GooglereCAPTCHA 那樣蒐集大量行為 (對隱私不利),也不需要解一堆奇怪的圖片問題。

CAPTCHA 最常用的領域,也就是擋 spam 這件事情來說,PoW 這樣的單一方式應該是不夠,但可以當作綜合方法裡面的一種...

AI 版的星海爭霸二將直接透過歐洲區的 Battle.net 匿名與人類對戰

前幾天 Blizzard 公佈的消息,DeepMind 的星海爭霸二 AI (AlphaStar) 將會透過 Blizzard 的 Battle.net 歐洲區伺服器跟人類對戰:「DeepMind Research on Ladder」。

Experimental versions of DeepMind’s StarCraft II agent, AlphaStar, will soon play a small number of games on the competitive ladder in Europe as part of ongoing research into AI.

預設是不會對到的,需要選擇參與:

If you would like the chance to help DeepMind with its research by matching against AlphaStar, you can opt in by clicking the “opt-in” button on the in-game popup window. You can alter your opt-in selection at any time by using the “DeepMind opt-in” button on the 1v1 Versus menu.

但你仍然不會知道對手是人還是 AI,而且如同一般對戰情況,這會影響到你的戰績:

For scientific test purposes, DeepMind will be benchmarking AlphaStar’s performance by playing anonymously during a series of blind trial matches. This means the StarCraft community will not know which matches AlphaStar is playing, to help ensure that all games are played under the same conditions. AlphaStar plays with built-in restrictions that the DeepMind team has defined in consultation with pro players. A win or a loss against AlphaStar will affect your MMR as normal.

okay,這樣大概知道為什麼只開放歐洲區了...

加州從今年七月開始,禁止 AI 偽裝成人類 (前幾天也有一些新聞在報導):「A California law now means chatbots have to disclose they’re not human」,對應的法條在「Bill Text - SB-1001 Bots: disclosure」這邊可以看到:

17941. (a) It shall be unlawful for any person to use a bot to communicate or interact with another person in California online, with the intent to mislead the other person about its artificial identity for the purpose of knowingly deceiving the person about the content of the communication in order to incentivize a purchase or sale of goods or services in a commercial transaction or to influence a vote in an election. A person using a bot shall not be liable under this section if the person discloses that it is a bot.

(b) The disclosure required by this section shall be clear, conspicuous, and reasonably designed to inform persons with whom the bot communicates or interacts that it is a bot.

而加州是 Blizzard Entertainment 的總部...

法條上面對「online platform」有設計排除條款,不過如果只算星海二的人數,有可能不到這個豁免限制... 所以得避開而改用歐洲區來測試?

(c) “Online platform” means any public-facing Internet Web site, Web application, or digital application, including a social network or publication, that has 10,000,000 or more unique monthly United States visitors or users for a majority of months during the preceding 12 months.

(c) This chapter does not impose a duty on service providers of online platforms, including, but not limited to, Web hosting and Internet service providers.

美國軍方應該是超級關注這個議題,相較於 AlphaGo 或是 AlphaZero 是資訊完全透明的遊戲,這次要踏入非對稱資訊的遊戲。

如果在這個領域上有成果的話,可以預期未來的戰爭 (yeah 實體戰爭) 會開始大量採用 AI 了...

reCAPTCHA 與語音辨識:以子之矛,攻子之盾

GooglereCAPTCHA 大概是目前最常用的反制機器人工具了,但因為 accessibility 的原因 (而且應該是有法令要求),還是需要提供盲人可以存取的方式,也就是以語音判斷是否是機器人。

unCaptcha2 就是用這塊,加上 Google 自家的語音辨識 API (也支援其他家 API) 可以直接打穿:

現有的程式碼已經先被 Google 反制,但目的是展示出這樣的概念。

Googlebot 的 Math.random()

Hacker News Daily 上看到「Googlebot’s Javascript random() function is deterministic」這則有趣的發現。作者發現 Googlebot 的 Math.random() 並不隨機,甚至是固定的:

The first time Googlebot calls Math.random() the result will always be 0.14881141134537756, the second call will always be 0.19426893815398216. The script I linked to above simply uses this fact but obfuscates it a little and ‘seed’ it with something that doesn’t look too arbitrary.

需要無法預測的 random number (有安全性需求的) 應該用 RandomSource.getRandomValues() 這類函數,而不是用 Math.random(),所以這點倒是還好...

Googlebot 的 Web rendering service 的細節

在「Polymer 2 and Googlebot」這邊文章裡面才看到 Google 官方在今年八月就有公開 Googlebot 所使用的 Web rendering service (WRS) 的細節:「Rendering on Google Search」。可以想像到是基於 Google Chrome 的修改:

Googlebot uses a web rendering service (WRS) that is based on Chrome 41 (M41). Generally, WRS supports the same web platform features and capabilities that the Chrome version it uses — for a full list refer to chromestatus.com, or use the compare function on caniuse.com.

裡面提到一些值得注意的事情,像是不支援 WebSocket,所以對於考慮 Google 搜尋結果的頁面來說,就要注意錯誤處理了...

Facebook 也參戰參與 StarCraft 的 AI 測試,只是成績不太好...

Facebook 也參與了 StarCraft 的 AI 測試,不過成績不太好:「Facebook Quietly Enters StarCraft War for AI Bots, and Loses」。

不過跟 DeepMind 投入的項目不太一樣,Facebook 投入的是 StarCraft,DeepMind 投入的是 StarCraft II...

比賽有二十八組,Facebook 拿下第六名,但前三名都只是獨立愛好者:

Final results released Sunday indicate Facebook still has a way to go: CherryPi finished sixth in a field of 28; the top three bots were all made by lone, hobbyist coders.

不太像認真要玩...