維基百科機房搬遷 (從佛羅里達州搬到維吉尼亞州)

Wikimedia 的官方網誌上看到 Wikimedia 的主機房將從 Tampa, Florida 搬遷到 Ashburn, Virginia (當然,這包括 Wikipedia):「Wikimedia sites to move to primary data center in Ashburn, Virginia」。

當初機房在 Florida 的原因是... Jimmy Wales 住附近 XDDD

A major reason for choosing Tampa, Florida as the location of the primary data center in 2004 was its proximity to founder Jimmy Wales' home, at a time when he was much more involved in the technical operations of the site.

搬遷到 Virginia 除了有比較穩定的網路以外,還包括了天氣因素 (颶風比較少)。

2011 年 11 月時,bits.wikimedia.org (主要是放 CSS 與 JavaScript) 已經改用新機房服務,2012 年 2 月時成功將 read-only page 拆到 cache server 上,同年 4 月時 upload.wikimedia.org (多媒體資料,包括使用者上傳的部份) 也導到新機房。

這幾個改變讓無法 cache 而丟到後端 ApacheMySQL 的量只剩下 10%,這次打算把這 10% 的量從 Florida 搬到 Virginia。

文末也說明了目前機器數量與 PV:

The Wikimedia Foundation currently operates a total of about 885 servers, and serves about 20 billion page views a month, on a non-profit budget that relies almost entirely on donations from readers.

全世界第六大的網站,每天約六億次 PV,現在只用了 885 台 server :p

Wikipedia 把英文版資料庫的其中一個 slave 從 MySQL 5.1 換到 MariaDB 5.5...

維基百科的 mailing list 上丟出的消息,英文版 Wikipedia 資料庫的 slave server 目前已經在 MariaDB 5.5 上了:「mariadb 5.5 in production for english wikipedia」。

之前跑的版本是 MySQL 5.1 + Facebook patchset 版本,整體大約快了 8%:

Taking the times of 100% of all queries over regular sample windows, the average query time across all enwiki slave queries is about 8% faster with MariaDB vs. our production build of 5.1-fb. Some queries types are 10-15% faster, some are 3% slower, and nothing looks aberrant beyond those bounds. Overall throughput as measured by qps has generally been improved by 2-10%. I wouldn't draw any conclusions from this data yet, more is needed to filter out noise, but it's positive.

然後計畫在接下來一兩個月觀察,沒問題就全換:

MariaDB has some nice performance improvements that our workload doesn't really hit (better query optimization and index usage during joins, much better sub query support) but there are also some things, such as full utilization of the primary key embedded on the right of every secondary index that we can take advantage of (and improve our schema around) once prod is fully upgraded, hopefully over the next 1-2 months.

效能不是最主要考量,而是政治面的原因,官方說法是支持 open source 社群:(沒有講的就是「我們對 Oracle 不怎麼信任...」)

The main goal of migrating to MariaDB is not performance driven. More so, I think it's in WMF's and the open source communities interest to coalesce around the MariaDB Foundation as the best route to ensuring a truly open and well supported future for mysql derived database technology. Performance gains along the way are icing on the cake.

另外參考:「on wikipedia and mariadb」。

Wikimedia 要幹蠢事了...

在「What are readers looking for? Wikipedia search data now available」這邊看到維基百科打算公開 search data,這不是前人幹過的蠢事嗎...

這讓我想起 2006 年「AOL search data leak」事件。AOL 希望對學術界有貢獻,於是把三個月份的 search data 匿名化後丟出來,結果被發現不管怎麼匿名化,search data 還是有辦法找出本人。AOL 也因此被告並且判決只要包含在內的每個人都可以拿到 USD$5000 的賠償。不過也因為 AOL 幹了蠢事,這也是少數被公開的 search real data。

看起來 Wikimedia 也要再來幹一次?

維基百科全面支援 HTTPS (SSL)

維基百科在官方的 Blog 上宣佈,所有的服務都支援 HTTPS (SSL):「Native HTTPS support enabled for all Wikimedia Foundation wikis」,也就是說,像是「https://zh.wikipedia.org/wiki/Wikipedia:首页」這樣的網址都支援了。

除了 *.wikipedia.org 以外,*.wikimedia.org 也支援了,於是包括像是 upload.wikimedia.org 也都可以使用 HTTPS:(圖片取自 File:Minori-Chihara-Animelo-Summer-Live-2011-08-27-21-41.jpg)

當然,還是有一些 script 寫死用 http,接下來應該都會被修正...

處處有廣告!

這個 extension 真的會讓人噴飯 XDDD

TechCrunch 介紹了「Jimmy Wales」這個 Google Chrome Extension:「Chrome Extension Lets You Just Add Jimmy Wales」。

Wikipedia 一直拒絕放廣告取得收入 (參考:「An appeal from Wikipedia founder Jimmy Wales」),但相當諷刺的是,每年大概這個時候都會有超煩人的橫幅廣告,請你捐錢給 Wikimedia...

我有寫一個 Greasemonkey Script 處理這個:「Wikipedia AD remover」。

這個 Extension 相當機車啊 XDDD (以下是裝上去以後的效果,取自 TechCrunch XDDD)