FBI 建議用擋廣告軟體降低瀏覽時的風險

在「Even the FBI says you should use an ad blocker」這邊看到的新聞,FBI 的公告則是在「Cyber Criminals Impersonating Brands Using Search Engine Advertisement Services to Defraud Users」這邊可以看到。

起因是有很多網路犯罪行為會透過購買廣告,在搜尋引擎上曝光誘導使用者點擊:

Cyber criminals purchase advertisements that appear within internet search results using a domain that is similar to an actual business or service. When a user searches for that business or service, these advertisements appear at the very top of search results with minimum distinction between an advertisement and an actual search result. These advertisements link to a webpage that looks identical to the impersonated business’s official webpage.

其中一種方式是,使用者輸入關鍵字想要下載某些特定的軟體,這時候網路犯罪者就會透過下廣告的方式,誘導使用者到假的網站下載有後門木馬的軟體:

In instances where a user is searching for a program to download, the fraudulent webpage has a link to download software that is actually malware. The download page looks legitimate and the download itself is named after the program the user intended to download.

這個方式讓我想到之前北韓政府對 PuTTY 的攻擊:「Trojanized versions of PuTTY utility being used to spread backdoor」。

而 FBI 建議個人的保護方式包括了 ad blocking extension,這算是減少被攻擊的管道:

Use an ad blocking extension when performing internet searches. Most internet browsers allow a user to add extensions, including extensions that block advertisements. These ad blockers can be turned on and off within a browser to permit advertisements on certain websites while blocking advertisements on others.

然後建議擋廣告軟體就是用 uBlock Origin,無論是 Chromium 系列的瀏覽器 (包括 Google Chrome),或是 Firefox 都有支援。

Kagi 可以分享搜尋結果了

先前 Kagi 限制只有註冊的使用者可以搜尋,但這次更新可以分享結果了 (在 Changelog 上可以看到):

Enable sharing of search results pages with people who lack a Kagi account #23 @ivan

這邊連結的是「Enable sharing of search results pages with people who lack a Kagi account」這個功能,看起來也算是個推銷其他人用 Kagi 的功能,所以就被做出來了。

我用無痕模式模擬沒有登入的情況,測了「Kalafina」這個關鍵字,確認可以看到這頁的結果了。

Kagi 的 url rewrite

Kagi 在上個禮拜推出了 url rewrite 的功能,可以把搜尋結果裡面的網址換掉:(在「Changelog」這邊可以看到)

Rewrite rules for domains - ability to e.g. translate "reddit.com" into "old.reddit.com" #158 @TeMPOraL

這個功能其實也可以在瀏覽器上用 extension 或是 userscript 處理掉 (跨機器可以透過瀏覽器的 cloud sync 來做),但目前應該還沒有這樣的東西,得自己寫一個出來。

範例提到把 reddit.com 換成 old.reddit.com 這種用法算是社群蠻常用的 (大家都不愛新版界面),不過我自己是把 *.m.wikipedia.org 轉成 *.wikipedia.org,這邊有多做一些事情,下面一條會提到:

^https://([a-z]+)\.m\.wikipedia\.org/[-a-z]*/(.*)|https://$1.wikipedia.org/wiki/$2

不難看出來是吃 regular expression,只是官方好像沒有特別說支援到哪種類型?(POSIX 系列或是 PCRE 類的,可以加減參考 Wikibooks 上「Regular Expressions」這邊的分類)

我另外一個 rule 是把維基百科有語言代碼的 https://zh.wikipedia.org/zh-*/ 轉成 https://zh.wikipedia.org/wiki/

^https://zh\.wikipedia\.org/zh-([a-z]*)/(.*)|https://zh.wikipedia.org/wiki/$2

這樣做的缺點是會出現兩個,可以看到第一個被轉了以後會出現一個小 icon,移上去可以看到是被哪個 rule 轉的:

JavaScript 上的 fuzzy search library

Hacker News Daily 上看到 Show HN (作者自己或是主要的 contributor 上來發表的作品) 給了一個號稱速度很快,吃資源很少的 fuzzy search library:「Show HN: uFuzzy.js – A tiny, efficient fuzzy search that doesn't suck (github.com/leeoniya)」。

這種已經發展許久,但突然有一天有人說他的東西超好超棒棒的,除非是有新的基礎演算法突破,不然馬上就會想到很經典的「Three circles model」,中間的那些區塊就懶的畫上去了:

依照他的「測試」,可以看到他宣稱完全領先的狀態:

但回過頭來看評論:

Thank you for this!

I am also quite frustrated with the current state of full text search in the javascript world. All libs I've tried miss the most basic examples and their community seems to ignore it. Will give yours a try but it already looks much better from the comparison page.

Edit: Nope, your lib doesn't seem to handle substitution well (THE most common type of typo), so yep, we are back to square one ...

From fuzzy search I expected that entering "super meet boy" or "super maet boy" will return "Super Meat Boy" but unfortunately currently it doesn't work this way and it's quite disappointing.

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uF...

看起來這個 library 沒有辦法解決 fuzzy search 最常見的 case (小 typo),依照範例描述的更像是 substring 搜尋加上一些額外的的功能,反而比較像是 auto completion library,或是講的比較廣一點,可以算是 auto suggestion library。

不過我覺得真正的重點 (對我來說的重點) 是下面的比較表格,因為列出了目前市場上的方案,這份清單之後可以拿來參考...

Kagi 公佈了收費三個月後的進展

Kagi 公佈了收費三個月後的進展 (可以參考「Kagi 開始收費了」這篇):「Kagi status update: First three months」。

搜尋的部份 (Kagi 這個產品線),目前有 2600 個付費使用者,以 US$10/mo 的費用來算大概是 US$26K/mo 的收入:

Kagi search is currently serving ~2,600 paid customers. We have seen steady growth since the launch 3 months ago. Note, this is with zero marketing and fully relying on word of mouth. We prefer to keep things this way for now, as we are still developing the product towards our vision of a user-centric web search experience.

後面在講財務狀況也是類似的數字 (幾乎都是 Kagi 的付費收入):

Between Kagi and Orion, we are currently generating around $26,500 USD in monthly recurring revenue, which incidentally about exactly covers our current API and infrastructure costs.

這個收入差不多 cover 目前的 infrastructure 部份,但還有薪資與其他的 operating cost 大約在 US$100K/mo 這個數量級,看起來還有很大的距離:

Between Kagi and Orion, we are currently generating around $26,500 USD in monthly recurring revenue, which incidentally about exactly covers our current API and infrastructure costs.

That means that salaries and all other operating costs (order of magnitude of $100K USD/month) remain a challenge and are still paid out of the founders’ pocket (Kagi remains completely bootstrapped).

然後要大概是目前十倍的付費數量才會打平 (25K 個使用者):

We are planning to reach sustainability at around 25,000 users mark, by further improving the product, introducing new offerings and pricing changes. With the product metrics being as good as they are, we should be able to reach this as our visibility increases.

比較好一點的消息是 churn rate 很低:

Product stickiness is also very high, with churn being lower than 3%.

然後提到每個使用者大約 27 次查詢 (包括 free tier),有些 user 大約在 100 次,peak 是 400 次:

We are currently serving around 70,000 queries a day or around ~27 queries/day/user (this includes free users which are about 10% of total users). There is a lot of variance in use though, with some users regularly searching >100 times a day. Every time we see a search count go >400 times in day we are happy to be an important part of someone’s search experience.

我看了一下自己的用量,看起來偏高一些,但沒到他說的每天平均 100 次:

然後提到了推出新方案的計畫,包括 Teams Plan & Family Plan,而目前在跑的方案會被分類到 Individual Plans。

另外比較重要的是 Individual Plans 有漲價的計畫。新的方案預定分成三個層級,主要是增加了一個 Kagi Starter 的版本:

  • Kagi Unlimited - $19/mo or $180/year ($15/mo) or $288/biennial ($12/mo) - Original Kagi experience, with unlimited searches
  • Kagi Starter ($5/mo; 200 searches) - For casual users who make less than 200 searches per month
  • Free basic - 50 free searches that reset every month

漲不少,雖然有提到在漲價前既有的付費使用者將會維持原價:

If such change to Individual plans is to occur, we plan to grandfather-in all early adopters (meaning all current and future paid customers, up until this change) allowing them to keep their existing subscription price as long as they don’t cancel it.

繼續觀察看看...

滲透測試的工具,各種搜尋引擎

Twitter 上看到的東西:

裡面是一張圖,整理一下這 24 個站台:

一堆 .io 網域...

裡面有蠻多服務是偶而會用到的,改拿來當作 pen test 的基礎工作也是蠻好用的,各種預先掃好的結果拿來搜...

Google 說要把 double quote 強制搜尋的功能加回來...

Hacker News Daily 上看到「We're improving search results when you use quotes (blog.google)」這則,才知道原來被拔掉了?(不過已經很久不是拿 Google Search 當主力了...)

原文在「How we're improving search results when you use quotes」這邊,裡面提到:

For example, if you did a search such as [“google search”], the snippet will show where that exact phrase appears:

[...]

In the past, we didn’t always do this because sometimes the quoted material appears in areas of a document that don’t lend themselves to creating helpful snippets.

在「Google for the exact phrase (and no, quotation marks don't help)」這邊可以看到 2020 的時候 double quote 就已經不是傳回精確的結果了。

不過應該不會回去用 Google Search 了,一方面是 Kagi 的表現還不錯,另外一方面是避免讓 Google 拿到更多資訊...

微軟的 Outlook 系統會自動點擊信件內的連結

前幾天在 Hacker News Daily 上翻到的,微軟的 Outlook 系統 (雲端上的系統) 會自動點擊信件內的連結,導致一堆問題:「“Magic links” can end up in Bing search results — rendering them useless.」,在 Hacker News 上的討論也有很多受害者出來抱怨:「“Magic links” can end up in Bing search results, rendering them useless (medium.com/ryanbadger)」。

原文的標題寫的更批評,指控 Outlook 會把這些 link 丟到 Bing 裡面 index,這點還沒有看到確切的證據。

先回到連結被點擊的問題,照文章內引用的資料來看,看起來是 2017 年開始就有的情況:「Do any common email clients pre-fetch links rather than images?」。

As of Feb 2017 Outlook (https://outlook.live.com/) scans emails arriving in your inbox and it sends all found URLs to Bing, to be indexed by Bing crawler.

在 Hacker News 上的討論也提到了像是 one-time login email 的機制也會因此受到影響,被迫要用比較費工夫的方法讓使用者登入 (像是給使用者 one-time code 輸入,而不是點 link 就可以登入)。

先記起來,以後在設計時應該會遇到,要重新思考 threat model...

搜尋引擎的替代方案清單

看到「A look at search engines with their own indexes」這篇在介紹各個搜尋引擎,作者設計了一套方法測試,另外在文章裡面也給了很多主觀的意見,算是很有參考價值的,可以試看看裡面提出來的建議。

另外在 Hacker News 上也有討論可以參考:「A look at search engines with their own indexes (2021) (seirdy.one)」。

在文章開頭的「General indexing search-engines」這個章節,先列出三大搜尋引擎 GBY (GoogleBingYandex),以及使用這三家當作後端資料庫的搜尋引擎,可以看到到處都是 Bing 的影子。

接著作者推薦 Mojeek 這個作為 GBY 的替代方案:

Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isn’t at GBY’s level, but it’s not bad either. If I had to use Mojeek as my default general search engine, I’d live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general search.

在「Smaller indexes or less relevant results」這邊也有一些方案,像是這個章節第一個提到的 Right Dao,作者就給他了不錯的評價:

Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its user base grows.

接下來的「Smaller indexes, hit-and-miss」與「Unusable engines, irrelevant results」也可以翻一下,看看作者怎麼批評 XD

然後是後面的「Semi-independent indexes」就出現了最近幾個比較有名的,像是 Brave Search 與目前我在用的 Kagi

整理的相當不錯...

Kagi 開始收費了

在「Kagi search and Orion browser enter public beta」這邊可以看到 public beta 與收費的消息:

We decided to start charging for Kagi search while in beta status because the cost of beta usage has gone up dramatically and we are not able to sustain it. Also, we want to get some kind of a financial “airworthiness” signal and see how we stand with our positioning and outlook for sustainability.

收費的費用是 US$10/mo,剛剛先把信用卡掛上去了...