收 Wikimedia (包括維基百科) 的 Recent Changes

所以有新的 streaming protocol 取代本來的 RCStream:「Get live updates to Wikimedia projects with EventStreams」。

這次新的 protocol 是走標準協定:

EventStreams is built on the w3c standard Server Sent Events (SSE). SSE is simply a streaming HTTP connection with event data in a particular text format. Client libraries, usually called EventSource, assist with building responsive tools, but because SSE is really just HTTP, you can use any HTTP client (even curl!) to consume it.

直接用瀏覽器打開也可以看到一直冒出來新的訊息...

維基百科每天的 PageView 數據 (2015/07/01 開始)

不只是維基百科,還包括所以維基基金會的專案都可以查到,精確度可以到每日。

MediaWiki 系統提供的 API 在維基基金會上的專案都關掉了。主要是因為維基基金會的專案量太大,前方有大量的 cache 擋住,後端能提供的資料其實沒有意義。取而代之的是另外規劃出來的 API。

API 的介紹說明在「Analytics/PageviewAPI」這邊可以看到,官方所提供的完整 API 說明文件則可以在「Wikimedia REST API」這邊查到。

實際測試發現資料從 2015/07/01 開始,每日更新的速度還不錯,像是 UTC 還是 2016/07/31 的現在可以取到 2016/07/30 的資料了。舉例來說,想要拉中文版 Kalafina 在 2016 七月由人閱覽的資料:

https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/zh.wikipedia/all-access/user/Kalafina/daily/20160701/20160731

如果是想拉日文版的就換成 ja.wikipedia

https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/ja.wikipedia/all-access/user/Kalafina/daily/20160701/20160731

MediaWiki 的 EmailDiff 套件

先前 MediaWiki 所提供的「變更通知」都只有在信件裡「通知」,而沒有在信件裡列出「改變的內容」,這使得讀信的人要再點進去看... (於是就懶的點了)

而前陣子看到有人寫了 extension 來輸出 diff,解決了這個問題:「MediaWiki extension EmailDiff: notification emails improved」。

信裡變成有這樣的內容:

Version differences:
@@ -846,5 +887,3 @@
 In cattle, temperament can affect production traits such as carcass and meat 
 quality or milk yield as well as affecting the animal's overall health and 
-reproduction. Cattle temperament is defined as "the consistent behavioral and physiological 
-difference observed between individuals in response to a stressor or environmental 
+reproduction. If you succeed in tipping a cow only partway, such that only one 
+of its feet is still on the ground, you have created lean beef. Such a feat is 
+well done. Naturally, being outside, the cow is unstable. When it falls over, 
+it becomes ground beef. Cattle temperament is defined as "the consistent behavioral 
+and physiological difference observed between individuals in response to a stressor or environmental 
 challenge and is used to describe the relatively stable difference in the behavioral 
 predisposition of an animal, which can be related to psychobiological mechanisms.

這樣可讀性高非常多,來找人裝看看好了...

維基媒體的效能資訊

在「Wikimedia Performance Metrics」這邊看到維基媒體 (Wikimedia) 的效能資訊網頁:「Metrics — Wikimedia Performance」,這邊是以前端效能為主。在「Grafana」這邊則包括了更多的資訊。

在「Performance」這邊有關於效能頁面的說明,看起來還在開發...

Wikipedia 引入 Mentor 制度

在「Get help editing Wikipedia with the new “Co-op” mentorship program」這邊看到英文版維基百科引入了導師的制度。

以往比較資深的編輯都是直接修正,或是到新手的 Talk 頁上提出建議,現在則是引入了導師的制度,從而得到了不錯的成果。

首先是編輯次數的成長:

Mentored editors were more productive than compared to editors who were not mentored. During the pilot, mentored editors made 7 times as many edits (35 vs. 4.5 in median edits). They also edited more articles during the pilot (10 vs. 3 on average).

以及參與度的提昇:

68% of mentored editors remained active in April 2015, the month after the end of pilot, whereas only 22% of non-mentored editors remained active.

更快速的參與:

Editors using the Co-op waited far less time for mentorship to begin (12 hours) compared to the only other mentorship space on en.wiki, Adopt-a-user (4 days).

以及長遠的品質提昇:

Despite being geared toward newer editors, the Co-op was utilized by more experienced editors who reported having positive and constructive experiences through mentorship.

不知道中文版什麼時候可以導入 :p

Wikimedia (包括維基百科) 推出 HSTS (強制使用 HTTPS)

Wikimeda 宣佈所有旗下的網站都會啟用 HTTPS 與 HSTS:「Securing access to Wikimedia sites with HTTPS」。

在這之前,使用者可以用 EFFHTTPS Everywhere 強制使用 HTTPS (在 FirefoxGoogle Chrome 都有上架),而這次則是全面強制使用了。

愈來愈多人使用 HTTPS 來保護隱私後 (而不僅僅是保護機密資料),接下來的問題就是要想辦法在 DNS 上保護了。也就是可以利用 DNS query pattern 知道你在看哪種 (或是哪一個) 頁面。

維基基金會的 2014 年八月月報

維基基金會釋出八月月報 (好像晚了三個月?):「Wikimedia Foundation Report, August 2014」,在「Wikimedia Highlights, August 2014」有比較精簡的版本。

維基基金會在報告裡有提供一些 PV 相關的數據,包括 comScore 的數字與自己 server log 所統計出來的數據。另外也包含了財務狀況。

其中技術相關的是取自「Wikimedia Engineering/Report/2014/August」這頁。另外因為這是八月的資料,我順便偷看了九月與十月的「Wikimedia Engineering/Report/2014/September」與「Wikimedia Engineering/Report/2014/October」。

可以看到在測試 HHVM 的計畫,而且目前看起來還不錯:「[Wikitech-l] [Engineering] Migrating test.wikipedia.org to HHVM」,拿了 test.wikipedia.org 測試,其中 speed test 的部份有大幅改善:

1) Speed test: measure the time taken to request the page 1000 times over just 10 concurrent connections:

                        HHVM    Zend    diff
Mean time (ms):         233     441     -47%
99th percentile (ms):   370     869     -57%
Request/s:              43      22.6    +90%

而負載測試的成果更好:

2) Load test: measure how much thoughput we obtain when hogging the appserver with 50 concurrent requests for a grand total of 10000 requests. What I wanted to test in this case was the performance degradation and the systems resource consumption

                        HHVM    Zend    diff
Mean time (ms):         355     906     -61%
99th percentile (ms):   791     1453    -45%
Request/s:              141     55.1    +156%
Network (Mbytes/s)      17      7       +142%
RAM used (GBs):         5(1)    11(4)
CPU usage (%):          90(75)  100(90)

維基百科之所以沒有遇到太多問題,主要是因為所使用的軟體是 open source 而且夠大的關係,直接成為 HHVM 測試的一環:「Compatibility Update」。

不過目前看起來應該還是跑 PHP,沒有看到整個都轉換過去的計畫。

另外一方面,搜尋引擎的更換就沒有這麼順利,雖然換到 Elasticsearch 後改善不少,不過可以看到八月的報告這樣寫:

tarted deploying Cirrus as the primary search back-end to more of the remaining wikis and we found what looks like our biggest open performance bottleneck. Next month's goal is to fix it and deploy to more wikis (probably not all). We're also working on getting more hardware.

而九月時就講到沒有銀彈,要加硬體去拼:

In September we worked to mitigate the performance bottleneck that we found in August. We found there to be no silver bullet but used the information we learned to pick and order appropriate hardware to handle the remaining wikis. We also implemented out significantly improved wikitext Regular Expression search. In October we've begun rolling out the wikitext Regular Expression search and received some of the hardware we need to finish cutting over the remaining wikis. We believe we'll get it all installed in October and cut the remaining wikis over in November.

十月的時候弄到機器了:

In October we prepared for November in which we deployed Cirrus to all the remaining wikis by installing new servers installing new versions of Elasticsearch and our plugins. We also fixed up regex search which had caused a search outage.

這些報告的連結裡面其實有些不會在對外新聞稿上面的評語... XD

維基基金會選用 Phabricator 當作工作的協作平台

維基基金會選擇了 Phabricator 當作協作平台:「Welcome to Phabricator, Wikimedia’s new collaboration platform」。

維基維基基金會已經使用 Bugzilla 管理很多事情,這次的導入也在文章裡說明了不少好處以及已知的問題。

看起來後面幾個月會把一些既有的工具轉移上去。

另外也因為是開放的系統,要研究 Phabricator 怎麼運作的人也可以上去看看?