Home » Posts tagged "wikipedia"

維基百科各語言與各地區的綜合資訊

維基百科推出了新版的介面:「Just how many people are reading Wikipedia in your country, and what language are they using?」。

We recently released a new interactive visualization of Wikipedia traffic by country and language. Called WiViVi, which stands for Wikipedia Views Visualized, the new visualization shows the geographic distribution of pageviews to any or all Wikipedias from two different perspectives[.]

這個介面可以看到每個版本在每個地區佔的比率,像是中文維基百科的情況:

不過翻牆 VPN 的不知道怎麼算...

收 Wikimedia (包括維基百科) 的 Recent Changes

所以有新的 streaming protocol 取代本來的 RCStream:「Get live updates to Wikimedia projects with EventStreams」。

這次新的 protocol 是走標準協定:

EventStreams is built on the w3c standard Server Sent Events (SSE). SSE is simply a streaming HTTP connection with event data in a particular text format. Client libraries, usually called EventSource, assist with building responsive tools, but because SSE is really just HTTP, you can use any HTTP client (even curl!) to consume it.

直接用瀏覽器打開也可以看到一直冒出來新的訊息...

V8 對 for-in 的最佳化

V8 引擎的人對 for-in 的最佳化寫了一篇解釋「Fast For-In in V8」,比較直接的結果就是維基百科Facebook 都變快了:

For example, in early 2016 Facebook spent roughly 7% of its total JavaScript time during startup in the implementation of for-in itself. On Wikipedia this number was even higher at around 8%.

可以看得出來是挑比較大的來改,而下一版的 Google Chrome (57) 將會對 for-in 會到另外一個極致:

The most important for-in helpers are at position 5 and 17, accounting for an average of 0.7% percent of the total time spent in scripting on a website. In Chrome 57 ForInEnumerate has dropped to 0.2% of the total time and ForInFilter is below the measuring threshold due to a fast path written in assembler.

主要是因為 spec 對 for-in 的定義寫得很模糊,所以就有很多實作的空間可以調整:

When we look at the spec-text of for-in, it’s written in an unexpectedly fuzzy way,which is observable across different implementations.

維基百科每天的 PageView 數據 (2015/07/01 開始)

不只是維基百科,還包括所以維基基金會的專案都可以查到,精確度可以到每日。

MediaWiki 系統提供的 API 在維基基金會上的專案都關掉了。主要是因為維基基金會的專案量太大,前方有大量的 cache 擋住,後端能提供的資料其實沒有意義。取而代之的是另外規劃出來的 API。

API 的介紹說明在「Analytics/PageviewAPI」這邊可以看到,官方所提供的完整 API 說明文件則可以在「Wikimedia REST API」這邊查到。

實際測試發現資料從 2015/07/01 開始,每日更新的速度還不錯,像是 UTC 還是 2016/07/31 的現在可以取到 2016/07/30 的資料了。舉例來說,想要拉中文版 Kalafina 在 2016 七月由人閱覽的資料:

https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/zh.wikipedia/all-access/user/Kalafina/daily/20160701/20160731

如果是想拉日文版的就換成 ja.wikipedia

https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/ja.wikipedia/all-access/user/Kalafina/daily/20160701/20160731

維基百科的 User Agent 公開資料

Nuzzel 上看到的東西...

維基百科不掛 Google Analytics 之類的第三方服務,而是透過 Piwik 蒐集後自己分析:「Dashboards and Data Downloads for Wikimedia Projects」。

主要有兩個資料可以看,一個是「Browser Statistics」,另外一個是「Readers: Pageviews and Unique Devices」。

不過翻了一下,Piwik 好像還是沒有寫到 NoSQL 之類的方案,出自「How do I use another database like Postgresql, SQLite, Oracle? Will you support Nosql databases like Hadoop, Mongodb?」:

Piwik only works on Mysql, where all the development and testing is done. Supporting multiple databases is a long term objective for Piwik, but not our current focus.

不知道維基百科是怎麼 scale 的...

Wikipedia 引入 Mentor 制度

在「Get help editing Wikipedia with the new “Co-op” mentorship program」這邊看到英文版維基百科引入了導師的制度。

以往比較資深的編輯都是直接修正,或是到新手的 Talk 頁上提出建議,現在則是引入了導師的制度,從而得到了不錯的成果。

首先是編輯次數的成長:

Mentored editors were more productive than compared to editors who were not mentored. During the pilot, mentored editors made 7 times as many edits (35 vs. 4.5 in median edits). They also edited more articles during the pilot (10 vs. 3 on average).

以及參與度的提昇:

68% of mentored editors remained active in April 2015, the month after the end of pilot, whereas only 22% of non-mentored editors remained active.

更快速的參與:

Editors using the Co-op waited far less time for mentorship to begin (12 hours) compared to the only other mentorship space on en.wiki, Adopt-a-user (4 days).

以及長遠的品質提昇:

Despite being geared toward newer editors, the Co-op was utilized by more experienced editors who reported having positive and constructive experiences through mentorship.

不知道中文版什麼時候可以導入 :p

Wikimedia (包括維基百科) 推出 HSTS (強制使用 HTTPS)

Wikimeda 宣佈所有旗下的網站都會啟用 HTTPS 與 HSTS:「Securing access to Wikimedia sites with HTTPS」。

在這之前,使用者可以用 EFFHTTPS Everywhere 強制使用 HTTPS (在 FirefoxGoogle Chrome 都有上架),而這次則是全面強制使用了。

愈來愈多人使用 HTTPS 來保護隱私後 (而不僅僅是保護機密資料),接下來的問題就是要想辦法在 DNS 上保護了。也就是可以利用 DNS query pattern 知道你在看哪種 (或是哪一個) 頁面。

Archives