PyPy 5.9 支援 Pandas 與 NumPy 了

PyPy 5.9 支援 machine learning 常用的 PandasNumPy 了:「PyPy v5.9 Released, Now Supports Pandas, NumPy」,包括 2.7 與 3.5 的相容版本:

The PyPy team is proud to release both PyPy3.5 v5.9 (a beta-quality interpreter for Python 3.5 syntax) and PyPy2.7 v5.9 (an interpreter supporting Python 2.7 syntax).

對於使用 Python 大量計算的人來說可以進場測試了 XD

Netflix 在美國漲價

Okay,所以 Netflix 在美國的價錢漲了:「Netflix Is Increasing Its Monthly Subscription Cost」。

基本版不變:

The basic streaming plan, which only allows for streaming on only one screen at a time, will remain at $7.99 a month.

雙螢幕版漲 $1 變成 $10.99:

The company’s most popular plan — which allows customers to stream titles on two devices with a single account — will increase by $1 to $10.99 a month.

四螢幕版漲 $2 變成 $13.99:

The premium plan — which allows customers to use the same account for four screens at once — will increase by $2 to $13.99 a month.

紐約時報的說明是因為自製成本增加,如果是這樣的話,其他地區也應該有可能會找機會漲?

The price hike arrives as Netflix continues to increase production of original TV and movie content. The company said that by the end of the year it will have spent around $6 billion for original content in 2017. In August, Ted Sarandos, Netflix’s chief content officer, told Variety that the company planned to spend about $7 billion for original programming next year.

PostgreSQL 10 發表

PostgreSQL 10 發表,有不少重要的功能 (進步):「PostgreSQL 10 Released」。

首先提到的是 Logical Replication:

Logical Replication - A publish/subscribe framework for distributing data

以往內建的 replication 是 block level change (同步哪個 block 改變的內容),對於版本不同的 PostgreSQL 就會痛。所以在 10 之前,想要處理 PostgreSQL 版本不同的問題都會使用第三方套件 (一種常見的情境就是資料庫的版本升級)。在 10 內建支援 Logical Replication 後就不需要掛其他套件了:

Logical replication extends the current replication features of PostgreSQL with the ability to send modifications on a per-database and per-table level to different PostgreSQL databases. Users can now fine-tune the data replicated to various database clusters and will have the ability to perform zero-downtime upgrades to future major PostgreSQL versions.

於是就可以達到 zero-downtime upgrade,這對於商業維運考量是個很重要的進展。

另外一個是 Improved Query Parallelism (在 9.6 就有,現在又再改善了),針對可平行化的 CPU-bounded SQL query 可以利用多 CPU 大幅加速,這點也是目前在 MySQL 上還沒看到的:

PostgreSQL 10 provides better support for parallelized queries by allowing more parts of the query execution process to be parallelized. Improvements include additional types of data scans that are parallelized as well as optimizations when the data is recombined, such as pre-sorting. These enhancements allow results to be returned more quickly.

上面提到這兩點其實對於某些需求是相輔相成的。

因為很多報表分析是可平行化的 CPU-bounded SQL query,但以前在 RDBMS 都不能被平行運算,於是很多單位就會想要倒出來到其他類型的資料庫運算 (以現在比較紅的產品,像是 Amazon RedshiftAmazon Athena,或是 BigQuery,甚至是丟進 ELK 裡)。但你用 PostgreSQL 又會痛在沒辦法很方便的把資料同步拉出來... (於是就會稍微妥協,用 cron job 每天倒資料)

現在 10 的這兩個功能剛好從兩個面向解決:一個是對於剛開使用 PostgreSQL 的人,他們可以繼續只用 PostgreSQL 撐久一點,因為報表需求的 SQL query 快很多;另外一方面也讓目前用 cron job 每天倒資料的人有了同步的選擇 (用 replication 同步到其他系統上)。

再來是 Quorum Commit for Synchronous Replication 這個功能,把分散式架構中需要「正確性」的底層技術做起來:

PostgreSQL 10 introduces quorum commit for synchronous replication, which allows for flexibility in how a primary database receives acknowledgement that changes were successfully written to remote replicas. An administrator can now specify that if any number of replicas has acknowledged that a change to the database has been made, then the data can be considered safely written.

整體來說,PostgreSQL 10 有非常多進步,而且這些進步對於商業營運考量都很有幫助...

星海爭霸 II 官方的 AI Workshop

Blizzard 公佈了在十一月的月初將會舉辦星海二的 AI Workshop:「Announcing the StarCraft II AI Workshop」。

On November 3 and 4, Blizzard and DeepMind will co-host the StarCraft II AI Workshop at the Hilton Anaheim hotel, next to the Anaheim Convention Center.

官方 (包括 DeepMind 團隊) 也會針對 SC2LE (Starcraft II Learning Environment) 與 SC2API (StarCraft II API) 提供交流:

Engineers and researchers from Blizzard and DeepMind will also be on-hand to meet with attendees and answers questions about the SC2LE and SC2API.

然後時間會跟 BlizzCon 2017 重疊 (目前看起來是卡到最後兩天),票是不能通用的:

While this event takes place during BlizzCon 2017, it is considered a separate event and is not part of the official BlizzCon program – therefore BlizzCon badges will not grant access to the AI workshop. However, we will be providing a limited pool of shareable BlizzCon badges that attendees of the AI workshop can use to check out BlizzCon and catch the StarCraft II Global Finals for inspiration on how to build superior AIs!

接下來應該會有不少消息出來... DeepMind 團隊的開發進度有可以跟頂尖選手競賽嗎?

跨各平台的 Microsoft Edge 又讓搞網站的人爆炸...

還好用的人應該不會太多 (?)

微軟宣佈在 iOSAndroid 以及微軟自家的系統上都推出 Microsoft Edge:「Announcing Microsoft Edge for iOS and Android, Microsoft Launcher」,另外也很「貼心」的整理了一篇不同平台上的差異 (尤其是 iOS 與 Android):「Microsoft Edge for iOS and Android: What developers need to know」。

不過 Twitter 上微軟自家人 Kyle Pflug‏ 講的比較簡單:

把重點講的超清楚,然後順建讓人有種 WTF 的感覺 XDDD (等於是一次推出三個不同行為的 browser 啊!)

設計師鐵定會詛咒他不要流行起來 XDDD

MySQL 8.0-rc 的效能測試

Oracle 的 Dimitri KRAVTCHUK (dim) 做了測試,整理出 MySQL 8.0-rc 與其他版本效能的比較:「MySQL Performance : 2.1M QPS on 8.0-rc」。

不過先不管 MySQL 8.0-rc,這個測試其實也在說 MySQL 在 5.6 到 5.7 的過程中,對於高階伺服器效能改善非常的多 (有非常多 CPU core 的機器):

就更不用說 5.5 版 (甚至已經沒支援的 5.0 & 5.1),差距就更大了...

GCC 的警告參數

在「Useful GCC warning options not enabled by -Wall -Wextra」這邊提到 GCC 如果把這些參數放入 -Wall-Wextra 裡會產生出太多的誤判,但有些參數還是很有用。

不過這邊介紹的參數蠻多都需要 GCC 6+ 甚至 GCC 7+,翻了 Ubuntu 16.04 是 GCC 5.4。

這是文章介紹的參數的整理:

-Wduplicated-cond -Wduplicated-branches -Wlogical-op -Wrestrict -Wnull-dereference -Wold-style-cast -Wuseless-cast -Wjump-misses-init -Wdouble-promotion -Wshadow -Wformat=2

另外一個比較特別的問題是,其實愈來愈多專案搬到 clang 上,這幾年可以看出能量的消長蠻明顯的...

Yahoo! 的資料外洩數量超過之前公佈的十億筆,上升到三十億筆

Oath (Y! 的新東家,Verizon 持股) 發表了新的通報,外洩數量直接上升到 3 billion 了:「Yahoo provides notice to additional users affected by previously disclosed 2013 data theft」。

也就是當時所有的使用者都受到影響:

Subsequent to Yahoo's acquisition by Verizon, and during integration, the company recently obtained new intelligence and now believes, following an investigation with the assistance of outside forensic experts, that all Yahoo user accounts were affected by the August 2013 theft.

在「Yahoo says all 3 billion user accounts were impacted by 2013 security breach」這邊的報導則是寫的比較清楚,把當時的使用者數字翻出來:

Yahoo today announced that the huge data breach in August 2013 affected every user on its service — that’s all three billion user accounts and up from the initial one billion figure Yahoo initially reported.

2013 這包用的是 MD5 hash,以現在的運算能力來看,可以當作沒有 hash...:

The stolen user account information may have included names, email addresses, telephone numbers, dates of birth, hashed passwords (using MD5) and, in some cases, encrypted or unencrypted security questions and answers.

已經是 "all" 了,接下來要更大包只能是其他主題了...

Facebook 在 MySQL 裡存時間的型態

MySQL at Facebook這邊說明提到了,Facebook 內部是使用 INT UNSIGNED 儲存時間:

Which gets us to the point that it is no different than storing INT (hello 2038?) or UNSIGNED INT (a bit later) or BIGINT (till the end of time) and possibly passing binary values in efficient protocols eventually.

If you got that far of this post, your likes in Facebook graph are stored with 'INT UNSIGNED' time field.

順道一提,INT 是 2038 年問題,INT UNSIGNED 是 2106 年問題。

而 Facebook 在 MySQL 上會選擇不使用 DATETIMETIMESTAMP 的原因其實跟技術搭不上太多關係,主因是因為 MySQL 根本沒打算修 XDDD

It is my favorite MySQL bug, simply because it forces any reasonable mind not to use TIMESTAMP, and MySQL is never going to fix it (nor will ever understand time). I lost my temper a bit on that bug: https://bugs.mysql.com/bug.php?id=38455

我的猜測是已經爛成一團了,而且大家都有 workaround (呃,其實就是 Facebook 推薦用 INT UNSIGNED 的方法),再考慮到有一票現有程式,在上面狂用 side effect 讓執行結果正確,不如就不要修這種吃力不討好的東西了 XDDD

另外一方面 timezone 資訊其實常常變化,常常需要更新 MySQL 的 timezone database (而這對於維運來說不是什麼開心的事情):

There're few ways around that. One of them is side-load and maintain timezone data inside MySQL itself - it has support for internal timezone database and tracks obscure time shifts like ones for "Pacific War Time" and "Pacific Peace Time". That is operationally feasible (you have to remind yourself to update the database whenever time rules change, and they do change a lot, if you consider every timezone in the world), but has limited value.

這就是為什麼大家遇到 MySQL 時都會推薦用 INT UNSIGNED 了...

另外可以參考三年前的文章「MySQL 裡儲存時間的方式...」,裡面引用了 Baron Schwartz 的說明:

All date and time columns shall be INT UNSIGNED NOT NULL, and shall store a Unix timestamp in UTC.

其實這已經是個 best practice 了...

Dnsmasq 的安全性漏洞

Google 這次出手炸的好痛:「Behind the Masq: Yet more DNS, and DHCP, vulnerabilities」。

當需要在 client 端處理 DNS 的事情,有蠻多開發者都會選擇用 Dnsmasq 處理,這次直接拉出七個 CVE (編號 CVE-2017-13704 與 CVE-2016-{14491-14496}),其中三個是 RCE:

看了一下,Ubuntu 也推出更新了,接下來有一堆機器要升級 Dnsmasq... :o