Curl 的 --socks5 與 --socks5-hostname

SOCKS5 支援用 hostname 連線,而且是透過 SOCKS server 查 IP address,然後查資料的時候翻到「Curl: Re: ��: Why resolve the dns locally when using a socks5 proxy」這篇,發現 Curl--socks5 原來有雷,一般人會想要用的應該是 --socks5-hostname 才對 XDDD

這是 --socks5

Use the specified SOCKS5 proxy - but resolve the host name locally. If the port number is not specified, it is assumed at port 1080.

而這是 --socks5-hostname

Use the specified SOCKS5 proxy (and let the proxy resolve the host name). If the port number is not specified, it is assumed at port 1080.

另外也有 --socks4--socks4a,不過 SOCKS4a 的協定最主要就是加上 hostname 的連線:

SOCKS4a extends the SOCKS4 protocol to allow a client to specify a destination domain name rather than an IP address; this is useful when the client itself cannot resolve the destination host's domain name to an IP address.

然後隔壁棚的 socks5://socks5h:// 也有異曲同工之妙...

透過 SOCKS5 界面連進 WireGuard 網段的軟體 wireproxy

Hacker News 上看到「A userspace WireGuard client that exposes itself as a proxy (github.com/octeep)」看到這個有趣的東西,可以把自己當作是一個 WireGuard client,然後透過 SOCKS5 界面讓使用者使用... 專案則是在 GitHub 上的「Wireguard client that exposes itself as a socks5 proxy」這邊可以看到。

除了軟體本身有支援 SOCKS5 的可以用以外,另外可以搭配透過 LD_PRELOAD 把 TCP 連線都轉進 SOCKS5 服務的套件來用,像是 tsocks 或是 redsocks 這種工具。

然後這包東西是用 Golang 寫的,好像剛好可以拿來練手包 Ubuntu PPA...

MangoDB 改名為 FerretDB (雪貂)

先前提到的 MongoDB 相容方案 MangoDB,透過PostgreSQL 當底層而且維持 open source license 的方案 (參考先前寫的「MangoDB:拿 PostgreSQL 當作後端的 MongoDB 相容層」),正式改名為 FerretDB:「MangoDB has a new name, and the momentum is stronger than ever. Meet FerretDB!」。

依照官方的說明,應該就是收到 C&D notice 了:

Moreover, a representative of MongoDB Inc. asked us to stop using the MangoDB name on our website, GitHub, or anywhere else, due to similarity or potential confusion.

Hacker News 上沒什麼討論,不過這邊還是列一下:「MangoDB has a new name (ferretdb.io)」,裡面有提到另外的「Ferret Database」這個網站,看起來是真的雪貂網站 XDDD

來看 Intel + Varnish 的單機 500Gbps 的 PR 新聞稿

在「Varnish Software Achieves 500Gbps Throughput Per Server for UHD Video Content」這邊看到 PR 稿,由 IntelVarnish 合作,宣稱達到單機 500Gbps 的 throughput 了:

According to Varnish Software, the following were the outcomes of the test:

  • 509.7 Gbps live-linear throughput, using a dual-processor configuration
  • 487.2 Gbps video-on-demand throughput, using a dual-processor configuration

白皮書在「Delivering up to 500 Gbps Throughput for Next-Gen CDNs」這頁可以用個資交換下載,不過用搜尋引擎找一下可以發現 Intel 那邊有放出 PDF (但不確定兩邊給的是不是同一份):「Delivering up to 500 Gbps Throughput for Next-Gen CDNs」。

單 CPU 的伺服器是四個 100Gbps 界面接出來,雙 CPU 的伺服器是八個 (這邊 SUT 是 system under test 的縮寫):

These client systems were connected to the CDN servers using 100 GbE links through a switch; 4x100 GbE connections for the single-processor SUT, and 8x100 GbE for the dualprocessor SUT. Testing was done using Wrk, a widely recognized open-source HTTP(S) benchmarking tool.

不過如果實際看圖會發現伺服器是兩個 100Gbps (單 CPU) 與四個 100Gbps (雙 CPU),然後 wrk 也吃了兩個或是四個 100Gbps:

在白皮書最後面也有提到測試的配置,都是在 Ubuntu 20.04 上面跑,單 CPU 用的是兩張 Intel 的 100Gbps 網卡,雙 CPU 的用的是四張 Mellanox 的 100Gbps 網卡:

3rd generation Intel Xeon Scalable testing done by Intel in September 2021. Single processor SUT configuration was based on the Supermicro SMC 110P-WTR-TNR single socket server based on Intel® Xeon® Platinum 8380 processor (microcode: 0xd000280) with 40 cores operating at 2.3 GHz. The server featured 256 GB of RAM. Intel® Hyper-Threading Technology was enabled, as was Intel® Turbo Boost Technology 2.0. Platform controller hub was the Intel C620. NUMA balancing was enabled. BIOS version was 1.1. Network connectivity was provided by two 100 GbE Intel® Ethernet Network Adapters E810. 1.2 TB of boot storage was available via an Intel SSD. Application storage totaled 3.84TB per drive and was provided by 8 Intel P5510 SSDs. The operating system was Ubuntu Linux release 20.04 LTS with kernel 5.4.0-80 generic. Compiler GCC was version 9.3.0. The workload was wrk/master (April 17, 2019), and the version of Varnish was varnishplus-6.0.8r3. Openssl v1.1.1h was also used. All traffic from clients to SUT was encrypted via TLS.

3rd generation Intel Xeon Scalable testing done by Intel in September 2021. Dual processor SUT configuration was based on the Supermicro SMC 22OU-TNR dual socket server based on Intel® Xeon® Platinum 8380 processor (microcode: 0xd000280) with 40 cores operating at 2.3 GHz. The server featured 256 GB of RAM. Intel® Hyper-Threading Technology was enabled, as was Intel® Turbo Boost Technology 2.0. Platform controller hub was the Intel C620. NUMA balancing was enabled. BIOS version was 1.1. Network connectivity was provided by four 100 GbE Mellanox MCX516A-CDAT adapters. 1.2 TB of boot storage was available via an Intel SSD. Application storage totaled 3.84TB per drive and was provided by 12 Intel P5510 SSDs. The operating system was Ubuntu Linux release 20.04 LTS with kernel 5.4.0-80- generic. Compiler GCC was version 9.3.0. The workload was wrk/master (April 17, 2019), and the version of Varnish was varnish-plus6.0.8r3. Openssl v1.1.1h was also used. All traffic from clients to SUT was encrypted via TLS.

不過馬上就會滿頭問號,四張 100Gbps 是怎麼跑到 500Gbps 的頻寬...

這份 PR 馬上就讓人想到 Netflix 先前放出來的投影片 (先前有在「Netflix 在單機服務 400Gbps 的影音流量」這篇提到),在 Netflix 的投影片裡面有提到他們在 Intel 平台上面受限於記憶體的頻寬,整台機器只能跑到 230Gbps。

另外一種猜測是,如果 Intel 與 Varnish 宣稱的 500Gbps 是算 switch 上的總流量 (有這樣算的嗎,你是 Juniper 嗎...),那這邊的 500Gbps 換算回去差不多就是減半 (還很客氣的沒把 cache 沒中需要去 origin server 拉資料的流量扣掉),跟 Netflix 在 FreeBSD 上跑出來的結果差不多啊...

坐等反駁 XDDD

繞過 Web 上「防機器人」機制的資料

這兩天的 Hacker News 冒出一些討論在講 Web 上「防機器人」機制要怎麼繞過:

第一篇主要是從各種面向都一起討論,從大方向的分類討論 (「Where to begin building undetectable bot?」),另外介紹目前有哪些產品 (在「List of anti-bot software providers」這邊)。

在文章裡有提到一個有意思的工具「puppeteer-extra-plugin-stealth」,主要是在 Node.js 類的環境,查了一下在 Python 上也有 pyppeteer-stealth,不過 Python 版本直接講了不完美 XDDD

Transplanted from puppeteer-extra-plugin-stealth, Not perfect.

第二篇文章在開頭就提到他不是很愛 Proxy,因為 Proxy 很容易偵測。在文章最後面則是提到了兩個方案,第一個是用大量便宜的 Android 手機加上 Data SIM 來跑,另外一個是直接用 Android 模擬器加上 4G 網卡跑。

依照這些想法,好像可以來改善一下手上的 RSS 工具...

MangoDB:拿 PostgreSQL 當作後端的 MongoDB 相容層

Hacker News Daily 上看到「A truly Open Source MongoDB alternative」這個東西,在「MangoDB: An open-source MongoDB alternative (mangodb.io)」的討論也可以翻一翻。

MongoDB 最主要的問題就是 4.0 以及以後的版本變成 SSPL 後就不是 open source license 了。

而 MangoDB 是個 Proxy service,前端提供 MongoDB 的協定,後端則是 PostgreSQL

GitHub 上的「MangoDB-io/MangoDB」這邊可以看到 MangoDB 主要是用 Golang 寫的,軟體授權則是 Apache License 2.0

The core of our solution is a stateless proxy, which converts MongoDB protocol queries to SQL, and uses PostgreSQL as a database engine. This will be compatible with MongoDB drivers, and should work as a drop-in replacement to MongoDB in many cases.

這對於只有提供 MongoDB 當儲存層的軟體來說會是個替代方案。不過如果是自己開發的話,這其實也證明了可以直接用 PostgreSQL,基本上 MongoDB 的功能都可以在 PostgreSQL 上找到方案。

話說這個讓我想到 2019 年 AWS 推出 Amazon DocumentDB 的時候,大家都一直在猜 Amazon DocumentDB 是不是拿 PostgreSQL 前面加上一些東西 (參考「大家在猜 Amazon DocumentDB 的底層是不是 PostgreSQL...」),後來也有人發現再更早之前就有人 PoC 過了:「A proof of concept MongoDB clone built on Postgres (github.com/jerrysievert)」,現在一般的公認應該就是,只是 AWS 沒有 open source 出來。

目前 MangoDB 還在早期的階段,但看起來這波應該會有一些能量進去幫忙...

弄個 whoogle.hasname.com 給大家玩

先前提到的 Whoogle:「自架的 Google Search Proxy 伺服器專案:Whoogle Search」與「改寫「Press "g" to Google (DuckDuckGo)」讓他支援 Whoogle」,後來想一想還是讓沒打算自己架的人可以用好了,指到國外的 latency 還是比較高...

如果你是 Chromium 類的瀏覽器,可以把搜尋引擎改成:

https://whoogle.hasname.com/search?q=%s

如果是我寫的 userscript (「Press "g" to Google (DuckDuckGo)」這個),可以改成:

https://whoogle.hasname.com/search?q=

然後 nginx 這邊先 access_log off; 了,理論上這樣應該是差不多了?

目前機器是放在客廳 (加 UPS),之後可能會丟到台灣的 VPS 上?

自架的 Google Search Proxy 伺服器專案:Whoogle Search

忘記在哪邊看到的連結,自架的 Google Search Proxy 伺服器專案:「Whoogle Search」,對應的 Hacker News 討論串也可以參考:「Whoogle Search: A self-hosted, ad-free, privacy-respecting metasearch engine (github.com/benbusby)」。

GitHub 上的分析可以看出來主要是 PythonFlask 寫的,然後說明就有提到是從 Google Search 撈資料,去掉所有可能可以被追蹤的項目:

Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile.

目前最新版是 0.5.4,從他列出來的 Public Instances 找了一個是最新版的測試,看起來沒什麼大問題:「gslin - Whoogle Search」。

應該可以自己在台灣架一個起來玩看看?安裝方式看起來很多,因為是 Python-based 的套件,可以用 pipx 或是 Docker 裝起來跑,然後可以改寫「Press "g" to Google (DuckDuckGo)」(press-g-to-google-duckduckgo) 讓他可以設定要轉到哪個 Google Search Proxy...

利用 Cloudflare Workers 繞過 Cloudflare 自家的阻擋機制

Hacker News 首頁上看到「How to bypass Cloudflare bot protection (jychp.medium.com)」這則,裡面的文章是「How to bypass CloudFlare bot protection ?」這篇,利用 Cloudflare Workers 繞過 Cloudflare 自家的 CAPTCHA 機制。

這個漏洞有先被送給 Cloudflare,但被認為不是問題,所以作者就決定公開:

Several months ago I submitted what appeared to be a security flaw to CloudFalre’s bugbounty program. According to them, this is not a problem, it’s up to you to make up your own mind.

技術上就是透過 Cloudflare Workers 當作 proxy server,只是看起來 Cloudflare 對自家 IP 有特別處理,在設定妥當後,用 Cloudflare Workers 的 IP address 去連 Cloudflare 的站台,幾乎不會觸發 Cloudflare 的阻擋機制。

不過 free tier 還是有限制,主要就是數量:

The first 100,000 requests each day are free and paid plans start at just $5/10 million requests, making Workers as much as ten-times less expensive than other serverless platforms.

作者也有提到這點:

So let’s enjoy the 100 000 request/day for your free Cloudflare account and go scrape the world !

但這是個有趣的方法,加上信用卡盜刷之類的方式,這整包看起來就很有威力...

Squid + GnuTLS 不支援 Intermediate Certificate 的問題

先前在「架設 Proxy over TLS」這邊提到了我用 Squid 架 Proxy over TLS 的服務起來用,本來在家裡跑得好好的,但到了公司發現卻不能用,追蹤後發現是目前 Ubuntu 裡面包的 Squid + GnuTLS 沒有辦法支援 intermediate certificate 的問題,而且有人問過了:「[squid-users] HTTPS_PORT AND SSL CERT」。

這邊先講測試的方法,然後後面再講解法。

測試的方式可以用 openssl s_client -connect hostname:port 測,正常的情況會可以看到兩層。

在這邊的例子裡,R3 簽了 home.gslin.orgDST Root CA X3 簽了 R3,而 DST Root CA X3 則在 root certificate 名單中:

$ openssl s_client -connect home.gslin.org:443
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = home.gslin.org
verify return:1
CONNECTED(00000003)
---
Certificate chain
 0 s:CN = home.gslin.org
   i:C = US, O = Let's Encrypt, CN = R3
 1 s:C = US, O = Let's Encrypt, CN = R3
   i:O = Digital Signature Trust Co., CN = DST Root CA X3
---

如果沒有送出 Intermediate Certificate 的話就會導致信任鏈無法建立,像是我故意設計的 nointermediate.gslin.com 這樣,R3 簽了 nointermediate.gslin.com,但 R3 並沒有在 root certificate 的名單中:

$ openssl s_client -connect nointermediate.gslin.com:443
CONNECTED(00000003)
depth=0 CN = nointermediate.gslin.com
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = nointermediate.gslin.com
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0 CN = nointermediate.gslin.com
verify return:1
---
Certificate chain
 0 s:CN = nointermediate.gslin.com
   i:C = US, O = Let's Encrypt, CN = R3                                
---

而想到的解法就是重新包一份 Squid 出來用,把本來的 --with-gnutls 改成 --with-openssl

這邊會先裝 Build-Depends 裡面指定的東西,然後加裝 libssl-dev,接著換掉 --with-gnutls 後編譯,最後產生 .deb

sudo apt install -y ed libltdl-dev pkg-config build-essential cdbs debhelper dpkg-dev lsb-release dh-apparmor libcppunit-dev libcap2-dev libdb-dev libecap3-dev libexpat1-dev libgnutls28-dev libkrb5-dev comerr-dev libldap2-dev libnetfilter-conntrack-dev libpam0g-dev libsasl2-dev libxml2-dev nettle-dev libssl-dev
apt-get source squid
cd squid/squid-4.10
sed -i -e 's/--with-gnutls/--with-openssl/' debian/rules
cd ..
dpkg-buildpackage -rfakeroot -uc -b

編好的 .deb 就可以拿到其他機器上裝了,然後就可以吐出 intermediate certificate 了...