AWS 提供 Machine Learning 能力的自動 Code Review 服務

AWS 推出了 Code Review 服務 Amazon CodeGuru,使用 machine learning 提供建議:「AWS announces Amazon CodeGuru for automated code reviews and application performance recommendations」。

從界面就可以看出來同時支援 GitHub 與自家的 CodeCommit,看起來可以給不少建議,但網站上沒有提到 security 這塊,本來以為產品的定位不在這邊:

不過 FAQ 裡還是有提到常見的 security issue:

Q: What type of issues are detected by Amazon CodeGuru Reviewer?

Amazon CodeGuru Reviewer checks for concurrency issues, potential race conditions, un-sanitized inputs, inappropriate handling of sensitive data such as credentials, resource leaks, and also detects race conditions in concurrent code.

然後 FAQ 裡提到目前只支援 Java:

Amazon CodeGuru Reviewer currently supports Java code stored in GitHub and AWS CodeCommit repositories.

服務的價位是使用行數計算,不過那個 per month 沒看懂是什麼意思:

Code scan (pull requests)$0.75 per 100 lines of code scanned per month

另外推出的 Amazon CodeGuru Profiler 則是 APM 類的東西,這塊目前市場上產品也很多,看起來也要被 AWS 進來蹂躪...

超快速的 Base64 encoding/decoding 實做

看到「Base64 encoding and decoding at almost the speed of a memory copy」這個,可以超級快速編解碼 Base64 的資料。

實做上是透過 IntelAVX-512 加速,在資料夠大的情況下 (超過 L1 cache 的大小),可以達到接近字串複製的速度 (這邊提到的 memcpy()):

We show how we can encode and decode base64 data at nearly the speed of a memory copy (memcpy) on recent Intel processors, as long as the data does not fit in the first-level (L1) cache. We use the SIMD (Single Instruction Multiple Data) instruction set AVX-512 available on commodity processors. Our implementation generates several times fewer instructions than previous SIMD-accelerated base64 codecs.

不過這樣 AMD 暫時要哭哭...

用程式解數學邏輯問題...

Hacker News Daily 上看到的數學邏輯問題:「“Which answer in this list is the correct answer to this question?”」。

問題是這樣:

Which answer in this list is the correct answer to this question?

  • All of the below.
  • None of the below.
  • All of the above.
  • One of the above.
  • None of the above.
  • None of the above.

accepted 的那個是推演的答案,但最高分的那個是寫程式窮舉 XDDD (不得不說大家都很愛這味...)

Google 與 Cloudflare 測試 Post-Quantum 演算法的成果

這幾年量子電腦的進展不斷有突破,雖然到對於攻擊現有的密碼學看起來還有一段時間,但總是得先開始研究對量子電腦有抵抗性的演算法...

其中 Google Chrome 的團隊與 Cloudflare 的團隊手上都有夠大的產品,兩個團隊合作測試的結果在學界與業界都還蠻重視的:「Real-world measurements of structured-lattices and supersingular isogenies in TLS」、「The TLS Post-Quantum Experiment」。

Google Chrome 這邊是使用了 Canary 與 Dev 兩個 channel,有控制組與兩個新的演算法:

Google Chrome installs, on Dev and Canary channels, and on all platforms except iOS, were randomly assigned to one of three groups: control (30%), CECPQ2 (30%), or CECPQ2b (30%). (A random ten percent of installs did not take part in the experiment so the numbers only add up to 90.)

這兩個演算法有優點也有缺點。一個是 key 比較小,但運算起來比較慢 (SIKE,CECPQ2b);另外一個是 key 比較大,但是運算比較快 (HRSS,CECPQ2):

For our experiment, we chose two algorithms: isogeny-based SIKE and lattice-based HRSS. The former has short key sizes (~330 bytes) but has a high computational cost; the latter has larger key sizes (~1100 bytes), but is a few orders of magnitude faster.

We enabled both CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE/p434 + X25519) key-agreement algorithms on all TLS-terminating edge servers.

感覺還是會繼續嘗試,因為這兩個演算法的缺點都還是有點致命...

Git 裡面禁用的函式...

Hacker News Daily 上看到 Git 裡面禁用的函式 (透過 .h 在編譯階段就擋下來):「banned.h」,在開頭有說明原因:

/*
 * This header lists functions that have been banned from our code base,
 * because they're too easy to misuse (and even if used correctly,
 * complicate audits). Including this header turns them into compile-time
 * errors.
 */

就算用的正確也增加了稽核的難度... 這些被禁用的函式包括了:

  • strcpy()
  • strcat()
  • strncpy()
  • strncat()
  • sprintf()
  • vsprintf()

所以應該是用 strlcpy()strlcat() 這些函式來處理字串,另外用 snprintf()vsnprintf() 來處理格式...

Perl 6 的名字被拿出來談...

在「Is Perl 6 Being Renamed?」這邊看到提到 Perl 6 名字的問題,主要是因為 Perl 6 跟現有 Perl 5 已經是不同的東西 (有點類似於當初 Python 2 到 Python 3 的計畫,但是差異比 Python 那邊多很多),而導致被提出來討論是否還要繼續使用 Perl 這個名字了:「"Perl" in the name "Perl 6" is confusing and irritating」。

When Perl 6 was announced, it was seen the way that Perl 2, Perl 3, Perl 4, and Perl 5 were seen: replacements for "$VERSION - 1". Over time, it became clear that though Perl 6 was in the same family as Perl 5, a straightforward migration path was unlikely. One only needs to look at the problems with Python 2 and Python 3 and the upgrade obstacles with their minor syntactic differences to understand that an upgrade from Perl 5 to Perl 6 isn't trivial.

如果把 Perl 5 與 Perl 6 當作不同的程式語言來看,這個問題就變成非技術性的問題了 (甚至是政治問題)。

接下來應該會是一連串混亂的討論,但解決問題的第一步永遠是先面對問題,至少這個問題被拿到檯面上「討論」了...

Facebook 修正錯字的新演算法

先前 Facebook 已經先發表過 fastText 了,在這個月的月初又發表了另外一個演算法 Misspelling Oblivious Embeddings (MOE),是搭著本來的 fastText 而得到的改善:「A new model for word embeddings that are resilient to misspellings」。

Facebook 的說明提到在 user-generated text 的內容上,MOE 的效果比 fastText 好:

We checked the effectiveness of this approach considering different intrinsic and extrinsic tasks, and found that MOE outperforms fastText for user-generated text.

論文發表在 arXiv 上:「Misspelling Oblivious Word Embeddings」。

依照介紹,fastText 的重點在於 semantic loss,而 MOE 則多了 spell correction loss:

The loss function of fastText aims to more closely embed words that occur in the same context. We call this semantic loss. In addition to the semantic loss, MOE also considers an additional supervisedloss that we call spell correction loss. The spell correction loss aims to embed misspellings close to their correct versions by minimizing the weighted sum of semantic loss and spell correction loss.

不過目前 GitHub 上的 facebookresearch/moe 只有放 dataset,沒有 open source 出來讓人直接用,可能得自己刻...

Bitbucket 放棄 Mercurial

Bitbucket 放棄對 Mercurial 的支援:「Sunsetting Mercurial support in Bitbucket」。

兩個時間點,一個是明年二月不能再新增,另外一個是明年六月完全停用:

February 1, 2020: users will no longer be able to create new Mercurial repositories
June 1, 2020: users will not be able to use Mercurial features in Bitbucket or via its API and all Mercurial repositories will be removed.

在 Mercurial 網站上的 wiki 也更新了:「Mercurial Hosting」,對於不想要搬到 Git 的人可以在這份列表裡找替代方案。

Backblaze B2 的 Copy File API 終於開放

BackblazeB2 算是我還蠻愛用來丟一些東西的地方 (配合他們與 Cloudflare 合作的免費頻寬)。

先前 B2 一直沒有複製檔案的功能,如果要有同樣檔案,變成得自己再上傳一次,這對於網路沒有很快的使用者會很痛苦,現在總算是提供 API 可以直接複製了:「B2 Copy File is Now Public」。

這個功能主要的文件在「b2_copy_file」。

另外這次也推出了「b2_copy_part」,針對檔案的合併所提供的 API。