在 C 裡 Concurrency 的 Library

看到「libdill: Structured Concurrency for C」這個東西,在 C 裡實作了兩個不同種類的 concurrency,一個是 proc (process-based) 一個是 go (corouting-based)。

支援的 function 算是蠻清晰的,範例也很清楚:

#include <libdill.h>
#include <stdio.h>
#include <stdlib.h>

coroutine int worker(const char *text) {
    while(1) {
        printf("%s\n", text);
        msleep(now() + random() % 500);
    return 0;

int main() {
    msleep(now() + 5000);
    return 0;

也有 channel 的觀念可以用,之後需要寫玩具的話應該是個好東西...

Google 發表計算網頁真實性的演算法 (Knowledge-Based Trust)

Slashdot 上看到 Google 發表了計算網頁真實性的演算法,Knowledge-Based Trust (KBT):「Google Wants To Rank Websites Based On Facts Not Links」,原始的論文 PDF 檔案可以在「Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources」這邊取得。

論文本身的原理不難懂 (其實方法相當有趣),主要是給出了三個貢獻。

首先是能夠區分是取出資訊的方法有問題 (extract 的演算法不夠好),或是網站本身就給出錯誤的資訊:

Our main contribution is a more sophisticated probabilistic model, which can distinguish between two main sources of error: incorrect facts on a page, and incorrect extractions made by an extraction system.


Our second contribution is a new method to adaptively decide the granularity of sources to work with: if a specific webpage yields too few triples, we may aggregate it with other webpages from the same website. Conversely, if a website has too many triples, we may split it into smaller ones, to avoid computational bottlenecks (Section 4).


The third contribution of this paper is a detailed, large-scale evaluation of the performance of our model.

KBT 並不是要取代 PageRank,而是跟 PageRank 互相配合,可以有效打擊內容農場 (Content farm) 這類網站,畢竟 PageRank 的假設在一般的狀況下是有邏輯的。

在「High PageRank but low KBT (top-left corner)」這段講到了這件事情:

We consider the 15 gossip websites listed in [16]. Among them, 14 have a PageRank among top 15% of the websites, since such websites are often popular. However, for all of them the KBT are in the bottom 50%; in other words, they are considered less trustworthy than half of the websites. Another kind of websites that often get low KBT are forum websites.