英國政府開始定時掃描英國內的網路服務

Hacker News Daily 上看到的,英國政府也建立了自己的掃描資料庫:「NCSC Scanning information」。

看起來是類似於 Shodan 這樣的服務,但是範圍限制在英國內的網路服務:

These activities cover any internet-accessible system that is hosted within the UK and vulnerabilities that are common or particularly important due to their high impact.

不過這個東西各國應該都有做,只是有沒有公開講?

微軟的 Outlook 系統會自動點擊信件內的連結

前幾天在 Hacker News Daily 上翻到的,微軟的 Outlook 系統 (雲端上的系統) 會自動點擊信件內的連結,導致一堆問題:「“Magic links” can end up in Bing search results — rendering them useless.」,在 Hacker News 上的討論也有很多受害者出來抱怨:「“Magic links” can end up in Bing search results, rendering them useless (medium.com/ryanbadger)」。

原文的標題寫的更批評,指控 Outlook 會把這些 link 丟到 Bing 裡面 index,這點還沒有看到確切的證據。

先回到連結被點擊的問題,照文章內引用的資料來看,看起來是 2017 年開始就有的情況:「Do any common email clients pre-fetch links rather than images?」。

As of Feb 2017 Outlook (https://outlook.live.com/) scans emails arriving in your inbox and it sends all found URLs to Bing, to be indexed by Bing crawler.

在 Hacker News 上的討論也提到了像是 one-time login email 的機制也會因此受到影響,被迫要用比較費工夫的方法讓使用者登入 (像是給使用者 one-time code 輸入,而不是點 link 就可以登入)。

先記起來,以後在設計時應該會遇到,要重新思考 threat model...

Google 提供掃描 JAR 檔內是否有中獎的 Log4j 的工具

Hacker News 首頁上看到 Google 提供了一套用 Golang 寫的工具,可以掃描 JAR 檔裡面是否有中獎的 Log4j:「log4jscanner」,對應的討論在「Log4jscanner (github.com/google)」這邊。

看起來是內部工具,放出來前先把 vcs history 清掉了:

We unfortunately had to squash the history when open sourcing. The following contributors were instrumental in this project's development: [...]

另外討論裡也有人提到「OWASP Dependency-Check」這個工具也可以掃,這套就更一般性了:

Dependency-check automatically updates itself using the NVD Data Feeds hosted by NIST.

阻擋網站透過瀏覽器掃 localhost

五月的時候,DuckDuckGoCharlie Belmer 發了一篇關於網站透過瀏覽器掃 localhost 的文章,引起了不少重視:「Why is This Website Port Scanning me?」。

這個月陸陸續續看到一些反制方式了,比較簡單的是透過像 uBlock Origin 這類可以擋特定 url 的方式,像是 EasyPrivacy 裡面把一些大站台的 javascript script 擋下來:「uBlock Origin ad blocker now blocks port scans on most sites」。

在同一篇文章的 comment 處也有人提到 uBlock Origin 可以做的更廣泛:「Block access to 127.0.0.1/localhost and LAN address from the internet #4318」,裡面有人已經整理好丟出來了:「lan-block.txt」,看起來也可以擋一些...

要擋得比較完整的還得考慮 scan.example.com IN A 127.0.0.1 這種方式繞過去的情況?這可能需要用 extension 了...

全自動化「印出來,簽名,掃描再寄回去」的事情

看到「falsisign」這個專案 (FalsiScan: Make it look like a PDF has been hand signed and scanned),完全符合這個 blog 的副標題「幹壞事是進步最大的原動力」的精神,不介紹一下好像說不過去...:

For bureaucratic reasons, a colleague of mine had to print, sign, scan and send by email a high number of pages. To save trees, ink, time, and to stick it to the bureaucrats, I wrote this script.

把「印出來簽名再寄掃描回去」這種事情在電子系統上全自動化:產生出來的 PDF 會把預先嵌好的簽名貼到程式指定的位置上,另外還會稍微把把紙張轉一些角度,並且加上影印時會產生的黑邊...

有超多這種雜事要處理,但是又閃不掉的人,可以研究一下...

Google 推出的 PhotoScan,利用多張照片去除反光

Google 推出的 PhotoScan (AndroidiOS) 可以利用多次不同角度的拍攝來消除反光:「PhotoScan: Taking Glare-Free Pictures of Pictures」。

把這五張照片轉成一張處理過的照片:

不打開書直接掃描內容

MIT Media Lab 弄出個好玩的東西,可以不打開書直接掃描書的內容:「Can computers read through a book page by page without opening it?」,主標題是「Terahertz time-gated spectral imaging for content extraction through layered structures」。

用 100Ghz 到 3Thz 的電磁波掃描:

In our new study we explore a range of frequencies from 100 Gigahertz to 3 Terahertz (THz) which can penetrate through paper and many other materials.

先前也有類似的方法,用 X-ray 或是超音波,但效果都不好:

Can’t X-ray or ultrasound do this? It may seem that X-ray or ultrasound can also image through a book; however, such techniques lack the contrast of our THz approach for submicron pen or pencil layers compared next to blank paper. These methods have additional drawbacks like cost and ionizing radiation. So while you might be able to hardly detect pages of a closed book if you use a CT scan, you will not be able to see the text. Ultrasound does not have the resolution to detect 20 micron gaps in between the pages of a closed book -distinguishing the ink layers from the blank paper is out of the question for ultrasound. Based on the paper absorption spectrum, we believe that far infrared time resolved systems and THz time domain systems might be the only suitable candidates for investigating paper stacks page by page.

不知道可以進展做到什麼程度,目前只是「能看懂」的程度,品質看起來還是不太夠:

PostgreSQL 對 Vacuum 效能的改善

在「No More Full-Table Vacuums」這邊提到了 PostgreSQL 在 vacuum 時效能的大幅改善,尤其是大型資料庫在 vacuum 時需要對整個表格從頭到尾掃一次以確保 transaction id 的正確性:

Current releases of PostgreSQL need to read every page in the database at least once every 2 billion write transactions (less, with default settings) to verify that there are no old transaction IDs on that page which require "freezing".

這動作在資料量大的機器上就會吃大量資源導致各種討厭的現象:

All of a sudden, when the number of transaction IDs that have been consumed crosses some threshold, autovacuum begins processing one or more tables, reading every page. This consumes much more I/O bandwidth, and exerts much more cache pressure on the system, than a standard vacuum, which reads only recently-modified page.

而作者送了 patch 改成只會讀還沒搞定的部份:

Instead of whole-table vacuums, we now have aggressive vacuums, which will read every page in the table that isn't already known to be entirely frozen.

要注意的是,agreesive vacuum 相較於 vacuum 會多吃很多資源,但可以打散掉 (有點像一次大 GC 導致 lag 與多次 minor GC 讓程式反應時間變得比較順暢的比較):

An aggressive vacuum still figures to read more data than a regular vacuum, possibly a lot more. But at least it won't read the data that hasn't been touched since the last aggressive vacuum, and that's a big improvement.

這個功能預定在 PostgreSQL 9.6 出現,不知道會不會變 default...

PostgreSQL 9.5 將會有 Parallel Sequential Scan

在「Parallel Sequential Scan is Committed!」這邊看到 PostgreSQL 9.5 (還沒出) 將會有 Parallel Sequential Scan 的功能。

文章的作者直接拿了一個大家超常用的惡搞來示範,也就是經典的 LIKE '%word%'

rhaas=# \timing
Timing is on.
rhaas=# select * from pgbench_accounts where filler like '%a%';
 aid | bid | abalance | filler
-----+-----+----------+--------
(0 rows)

Time: 743.061 ms
rhaas=# set max_parallel_degree = 4;
SET
Time: 0.270 ms
rhaas=# select * from pgbench_accounts where filler like '%a%';
 aid | bid | abalance | filler
-----+-----+----------+--------
(0 rows)

Time: 213.412 ms

這功能真不錯 XD

Google 的書本掃描服務被認定為「合理使用」

Google 的書本掃描服務被認定為合理使用:「Google's Book-Scanning Project Ruled to Be Legal `Fair Use'」。

“Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality and display of snippets from those works are non-infringing fair uses,” U.S. Circuit Judge Pierre Leval wrote on behalf of the court. “The purpose of the copying is highly transformative, the public display of text is limited and the revelations do not provide a significant market substitute for the protected aspects of the originals.”

看起來是一路打到第二巡迴上訴法院了?(負責紐約地區)