維基百科各語言與各地區的綜合資訊

維基百科推出了新版的介面:「Just how many people are reading Wikipedia in your country, and what language are they using?」。

We recently released a new interactive visualization of Wikipedia traffic by country and language. Called WiViVi, which stands for Wikipedia Views Visualized, the new visualization shows the geographic distribution of pageviews to any or all Wikipedias from two different perspectives[.]

這個介面可以看到每個版本在每個地區佔的比率,像是中文維基百科的情況:

不過翻牆 VPN 的不知道怎麼算...

HHVM 的後續

官方對於 HHVM 的未來提出了說明:「The Future of HHVM」。重點就是他們不打算以 PHP7 為目標,打算關起來自己玩...:

Consequently, HHVM will not aim to target PHP7. The HHVM team believes that we have a clear path toward making Hack a fantastic language for web development, untethered from its PHP origins.

如果以 Packagist 上的資料來看 (PHP Versions Stats - 2017.1 Edition),HHVM 的數量應該是沒人了:

And because a few people have asked me this recently, while HHVM usage is not included above in the graph it is at 0.36% which is a third of PHP 5.3 usage and really hardly significant. I personally think it's fine to support it still in libraries if it just works, or if the fixes involved are minor. If not then it's probably not worth the time investment.

Comment 的地方有註明這是扣掉 CI 的量:

@ocramius: These numbers ignore Travis CI and other CI systems that set the "CI" env var in their workers. Without excluding those HHVM is around 0.95% so it's still low but those .36% is probably actual usage.

這樣就放心可以完全不用管 HHVM 了 XDDD

Tails 3.0 出了,然後又開始提供 BitTorrent 下載了...

Tails 是個 Tor 的獨立環境,可以直接用 USB 開機或是透過虛擬機上線,避免受到其他干擾而洩漏資訊。

剛剛看到了 Tails 發佈 3.0 版的消息:「Tails 3.0 is out」,比較特別的是在下載頁面發現 BitTorrent 的下載方式又被放回去了。

前陣子本來在「BitTorrent 對 SHA-1 的改善計畫?」這邊有提到 Tails 的團隊因應 SHA-1 的問題,在討論是否要繼續提供 BitTorrent 的問題 (因為 BitTorrent 裡使用 SHA-1 做很多事情),當時的決議其實是暫時使用 BitTorrent 發佈:「Biterrant attack」。

不過後續的更新這樣寫,所以看起來暫時會先恢復 BitTorrent 下載的方式:

After reading this discussion, my current conclusion is that we've totally misunderstood the impact of the attack, and that the security of our bittorrent downloads is still good enough. So I propose we revert to what we did before the 2.12 release, i.e. ship Torrents, for the foreseeable future when 2nd pre-image attacks are not realistic yet.

回到 Tails 3.0 本身,其中比較大的改變是放棄了 32bits 的支援:

Tails 3.0 works on 64-bit computers only and not on 32-bit computers anymore.

然後以前要在開機進到進階選單才有的語言設定,現在變成預設就會提醒了:

也許該再來測試看看注音輸入法好不好用的問題了 XD

Tor 在考慮使用 Rust 改寫

不過也不確定是不是愚人節消息就是了:「[tor-dev] Tor in a safer language: Network team update from Amsterdam」。

Tor 考慮使用 Rust 改寫,目前已經完成的部份,以及接下來的規劃:

What has already been done:
- Rust in Tor build
- Putting together environment setup instructions and a (very small) initial draft for coding standards
- Initial work to identify good candidates for migration (not tightly interdependent)

What we think are next steps:
- Define conventions for the API boundary between Rust and C
- Add a non-trivial Rust API and deploy with a flag to optionally use (to test support with a safe fallback)
- Learn from similar projects
- Add automated tooling for Rust, such as linting and testing

目前看到後續的討論只有「[tor-dev] Tor in a safer language: Network team update from Amsterdam」這篇,也許等全世界的 4/1 都過了之後再回來確認吧...

PHP 的 Unquoted Strings 將在 PHP 8 被移除

Twitter 上看到 PHP 的 unquoted string 被視為字串的功能將被移除:「PHP RFC: Deprecate and Remove Bareword (Unquoted) Strings」。

常見的情境是 $_GET[bar] 這樣的用法被視為與 $_GET["bar"] 相同... 超奇怪的功能,而現在這個功能已經投票通過,將會在 7.2 被列為 deprecated,到 8 就會拿掉。

這個功能本來是標示 E_NOTICE,但比較特別的是,雖然是列為 deprecated,PHP 7.2 預定會標示為 E_WARNING 而不是 E_DEPRECATED。主要是這次的兩個目標互相衝突 (可以參考原文),取比較有效的那個 (因為 PHP 8 就不會有這個問題了,所以 PHP 7.2 的過渡期以比較容易提醒使用者的那個為主)。

投票是 100% 通過 (41/41)。

用程式自動同步字幕與聲音

Hacker News 上看到的專案,readbeyond/aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).

馬上想到的是... 這根本就是字幕組的福音 XDDD

支援的語言:

Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR

除了 ENG 以外,有 JPN... XD

「從 X 語言轉換到 Y 語言」的研究

算是個有趣的研究,看看就好:「The eigenvector of "Why we moved from language X to language Y"」。

作者先用 Google 搜尋拉出數量:

但這個方法少了保留在原語言的資料,所以後面看看就好 XD

接下來是轉換成 stochastic matrix (真是讓人懷念的東西...):

最高的幾個語言是 GolangCJava,不過如同前面講的,因為少了留在原語言的資料,看看就好不用當真...

Ruby 2.4 中 Hash Table 的效能改善

前幾天 Ruby 推出了 2.4.0 (Ruby 2.4.0 Released),其中特別被拿出來提的:「Introduce hash table improvement (by Vladimir Makarov)」。

討論串很長而且歷時很久,但可以看出來方向是提高 CPU cache 效率:

Modern processors have several levels of cache. Usually,the CPU reads one or a few lines of the cache from memory (or another level of cache). So CPU is much faster at reading data stored close to each other. The current implementation of Ruby hash tables does not fit well to modern processor cache organization, which requires better data locality for faster program speed.

中間還有拿 Redmine 當作測試項目... XD

Amazon Polly 與 Amazon Lex:人機介面中的語音處理

AWS 這次推出的這兩個服務剛好成對:「Amazon Polly – Text to Speech in 47 Voices and 24 Languages」、「Amazon Lex – Build Conversational Voice & Text Interfaces」。

Amazon Polly 負責把文字唸出來變成語音,而 Amazon Lex 則是將語音辨識回文字,不過目前都還不支援中文... 但畢竟讓 user interface 這塊變得更親民了,算是基礎建設中服務,讓 startup 專心在產品本身上。

PHP 的 JIT 翻修計畫

在「JIT for PHP project」這邊看到 PHP 新的翻修計畫:

I'm glad to say that we have started a new JIT for PHP project and hope to deliver some useful results for the next PHP version (probably 8.0).

We are very early in the process and for now there isn't any real performance improvement yet. So far we spent just 2 weeks mainly working on JIT infrastructure for x86/x86_64 Linux (machine code generation, disassembling, debugging, profiling, etc), and we especially made the JIT code-generator as minimal and simple as possible. The current state, is going to be used as a starting point for research of different JIT approaches and their usability for PHP.

不知道會帶來多少效能提昇 :o