Home » Computer » Network » Archive by category "Spam" (Page 2)

uBlock Origin 支援的 :has()

查資料的時候發現 uBlock Origin 的「Static filter syntax」已經自己實作 :has() 了 (雖然有一些限制)。

這個 CSS4 (draft) 的特性目前還沒有瀏覽器支援,所以 uBlock Origin 決定自己來:

This is a planned CSS4 operator, but no browser supports it yet. I decided to go ahead and implement it so that it can already be used. See The Relational Pseudo-class: :has() in the Selector Level 4/Editor's Draft.

由於效能問題,要求一定要有 hostname,而不能是 global rule:

uBO's implementation is simplified so as to ensure performance. The :has operator must be used with at least one hostname (it must be specific), and must be of the form (example)[.]

這對於 html block 長的幾乎一樣,只有在某個地方多出 Promoted by ... 之類的結構處理起來很方便,可以拿來找出「裡面有廣告 div 的母體 div」然後整包處理掉... (你不會只想要拿掉 Promoted by ...,而是連廣告內容都拿掉)

自建 Mail System 的難度

Hacker News 上的「Ask HN: Is it possible to run your own mail server for personal use?」這篇道出了現在自建 mail system 的難度。作者遇到信件常常被各大 mail 服務歸類成 spam:

The problem is making sure my mail is not marked as spam by the major MTAs out there, gmail and hotmail both mark my mails as spam.

整理一下現在自己建 mail system 要做到哪些事情:

  • 確認 IP (包括 IPv4/IPv6) 沒有列入任何 Open Relay 清單中。
  • 確認 IP 的反解可以查出對應的正解。
  • 確認 SPF 設定。
  • 確認送出去的信件有 DKIM 簽名,而且 DNS 也有設上對應的設定。
  • 確認 TLS 的發送與接收都正常。
  • 確認 DMARC 機制正確運作。

如同「Exercising Software Freedom in the Global Email System」這邊講的,現在要自己搞 mail system 超累...

用 hosts 搞出來的 Adblock...

在「Amalgamated hosts file」這邊看到超大包的 hosts,拿來擋廣告:

This repo consolidates several reputable hosts files and merges them into a single amalgamated hosts file with duplicates removed.

Currently this amalgamated hosts file contains 27,148 unique entries.

一包 hosts 有兩萬七千筆資料會不會太多了點...

話說不知道能不能 import 進 BIND 或是 Unbound 裡面直接讓整個組織用?

追蹤開信的能力

追蹤開信的能力是廣告信經常會用到的能力,為了要看「成效」而要追蹤開信率之類的數字。

在「Email Privacy Tester」這邊看到有趣的工具,介紹了「Email Privacy Tester」這個網站可以拿來測試各種 mail client 上可被追蹤的功能。

依照作者測試,Gmail 完美的擋下目前所有追蹤技巧:

Apple Mail 還很慘,有一堆方式可以被追蹤開信:

不知道其他家 (像是 Yahoo) 的情況...

社交工程演練網站 CuttlePhish

Hacker News Daily 上看到的服務:「CuttlePhish」。

CuttlePhish 是個社交工程演練網站,提供演練測試 (Phishing as a Service),付費的方式也很有趣:

Twenty bucks, for up to 100 users. We charge per-batch of emails, (not per-email or per-click). You only pay if somebody clicks one of the links in our emails.

只有在有人點的時候收 USD$20... XD

Google 發表計算網頁真實性的演算法 (Knowledge-Based Trust)

Slashdot 上看到 Google 發表了計算網頁真實性的演算法,Knowledge-Based Trust (KBT):「Google Wants To Rank Websites Based On Facts Not Links」,原始的論文 PDF 檔案可以在「Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources」這邊取得。

論文本身的原理不難懂 (其實方法相當有趣),主要是給出了三個貢獻。

首先是能夠區分是取出資訊的方法有問題 (extract 的演算法不夠好),或是網站本身就給出錯誤的資訊:

Our main contribution is a more sophisticated probabilistic model, which can distinguish between two main sources of error: incorrect facts on a page, and incorrect extractions made by an extraction system.

第二個則是在效能上的改善:

Our second contribution is a new method to adaptively decide the granularity of sources to work with: if a specific webpage yields too few triples, we may aggregate it with other webpages from the same website. Conversely, if a website has too many triples, we may split it into smaller ones, to avoid computational bottlenecks (Section 4).

第三個則是提出好的分散式演算法,可以螞蟻雄兵計算出來:

The third contribution of this paper is a detailed, large-scale evaluation of the performance of our model.

KBT 並不是要取代 PageRank,而是跟 PageRank 互相配合,可以有效打擊內容農場 (Content farm) 這類網站,畢竟 PageRank 的假設在一般的狀況下是有邏輯的。

在「High PageRank but low KBT (top-left corner)」這段講到了這件事情:

We consider the 15 gossip websites listed in [16]. Among them, 14 have a PageRank among top 15% of the websites, since such websites are often popular. However, for all of them the KBT are in the bottom 50%; in other words, they are considered less trustworthy than half of the websites. Another kind of websites that often get low KBT are forum websites.

再找時間細讀其他類似的演算法...

Archives