Cloudflare 因為 Regular Expression 炸掉的問題

先前 Cloudflare 就有先說明七月二日的 outage 是因為 regular expression 造成的 (ReDoS),不過昨天發的文章更完整了,導致爆炸的 regular expression 都給出來了:「Details of the Cloudflare outage on July 2, 2019」。

ReDoS 不算是新的問題,但卻是不太好避免的問題,因為需要有經驗的工程師 (中過獎的工程師) 才比較容易知道哪些 regular expression 是有問題的... 另外就是有花時間研究 regular expression 演算法的工程師也比較容易避開。

也因次,ReDoS 算是這十年來大家在還的債,各家 framework 都因為這個問題改寫了不少 regular expression。

這次的重點在這串式子導致了 ReDoS:

(?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*)))

通常容易中獎的地方就是無限制字元與 * & + 連發的地方,後面這塊 )*.*(?:.*=.*))) 看起來就不太妙,果然在後面的分析也有提到:

The critical part is .*(?:.*=.*).

以前應該是在 Formal language 裡學到的,在課堂裡面其實會學到不少業界常用工具的基礎理論...

美國政府對於書面文字的要求

好像是在 Twitter 上看到的,但一時間找不到是誰推的...

美國在 2010 年簽署的「Plain Writing Act of 2010」要求各種政府文件都必須用簡單的文字書寫,甚至還弄一個官方網站「Home | plainlanguage.gov」列出說明...

在網站裡面的「Use simple words and phrases」給了一個蠻長對應表,可以將一些艱澀的法律慣用詞彙換成平常常用的詞彙...

維基百科給的 Before & After 範例還蠻不錯的,在比較極端的情況下,讀起來的確輕鬆很多:

(Before) The amount of expenses reimbursed to a claimant under this subpart shall be reduced by any amount that the claimant receives from a collateral source. In cases in which a claimant receives reimbursement under this subpart for expenses that also will or may be reimbursed from another source, the claimant shall subrogate the United States to the claim for payment from the collateral source up to the amount for which the claimant was reimbursed under this subpart.

(After) If you get a payment from a collateral source, we will reduce our payment by the amount you get. If you get payments from us and from a collateral source for the same expenses, you must pay us back the amount we paid you.

macOS 打算移除 Perl/Python/Ruby

從 beta 版的 release note 可以看到 macOS 打算在 10.15 移除 PerlPythonRuby:「macOS 10.15 Beta Release Notes」。

Scripting language runtimes such as Python, Ruby, and Perl are included in macOS for compatibility with legacy software. Future versions of macOS won’t include scripting language runtimes by default, and might require you to install additional packages. If your software depends on scripting languages, it’s recommended that you bundle the runtime within the app. (49764202)

看起來會另外包一份出來... 不過這樣內建的工具就少了些,雖然 shell script 是 turing machine... :o

換到 vim-polyglot 上...

在「A guide to setting up Vim for JavaScript development」這邊看到可以用 sheerun/vim-polyglot 吃多種語言:

Vim supports basic syntax highlighting for JavaScript but I found it suboptimal especially when it comes to modern ES2015+ syntax, and it doesn’t support JSX when working with React. I found that vim-javascript and vim-jsx solved my problems in both instances.

However, I later replaced those two with vim-polyglot which is a plugin that bundles several other syntax plugins for over 100 languages, and loads them on demand so that performance is not affected.

先前遇到新的語言就得去找新的 plugin 加,現在看起來可以用一套吃遍天下 (只要一直更新),換過去後先短暫的測了一下,沒什麼大問題...

Word2Vec:透過向量猜測其他詞彙的意思

2013 年時在「Automatic Translation Without Dictionaries」這邊看到關於機器翻譯時的自我學習方式,裡面提到了「How Google Converted Language Translation Into a Problem of Vector Space Mathematics」這篇報導,而裡面提到的論文則是 Google 發表在 arXiv 上的「Exploiting Similarities among Languages for Machine Translation」這篇。

最近看到「The Illustrated Word2vec」這篇,把五年多前的記錄交叉拉出來看... 這個算式算是給了大家基本的想法,透過公式來解釋文字的意義:

拉出這樣的關係後,就有機會學習新的詞彙... 進而用在其他語言的翻譯上。

AWS 推出了 Live 時全自動上字幕的功能

AWS 推出了在直播時就自動上字幕的功能:「Introducing Live Streaming with Automated Multi-Language Subtitling」,其實就是把現有的服務兜出來:「Live Streaming with Automated Multi-Language Subtitling」。

The solution deploys Live Streaming on AWS which includes AWS Elemental MediaLive, MediaPackage, Amazon CloudFront. The solution also deploys AWS Lambda, Amazon Simple Storage Service, Amazon Transcribe, and Amazon Translate.

對於比較沒那麼要求翻譯品質的情況也許可以玩看看...?

AWS 的 OpenJDK 11 (Amazon Corretto 11) 推出 General Availability 版

先前在「AWS 決定花力氣支援 OpenJDK (Corretto 計畫)」與「Amazon 版的 OpenJDK 8 進入 GA」後的下一步,就是對 OpenJDK 11 也推出對應的 Amazon Corretto 11:「Amazon Corretto 11 is Now Generally Available」。

這個版本將至少支援到 2024 年 8 月,也就是五年的支援期:

Long-term support (LTS) for Corretto includes performance enhancements and security updates for Corretto 8 until at least June 2023 at no cost. Updates are planned to be released quarterly. Amazon will provide LTS for Corretto 11 with quarterly updates until at least August 2024.

不過先前有些軟體測試時發現 OpenJDK 11 上不能跑,這些軟體還是得暫時用 OpenJDK 8 的版本來養...

JavaScript 的 == 條列式比較

出自規格書裡面的「7.2.14 Abstract Equality Comparison」,我都是遇到再去查,不過如果有人想要理解與背起來的,可以參考這邊:「JavaScript "loose" comparison step by step」。

The comparison x == y, where x and y are values, produces true or false. Such a comparison is performed as follows:

If Type(x) is the same as Type(y), then
Return the result of performing Strict Equality Comparison x === y.
If x is null and y is undefined, return true.
If x is undefined and y is null, return true.
If Type(x) is Number and Type(y) is String, return the result of the comparison x == ! ToNumber(y).
If Type(x) is String and Type(y) is Number, return the result of the comparison ! ToNumber(x) == y.
If Type(x) is Boolean, return the result of the comparison ! ToNumber(x) == y.
If Type(y) is Boolean, return the result of the comparison x == ! ToNumber(y).
If Type(x) is either String, Number, or Symbol and Type(y) is Object, return the result of the comparison x == ToPrimitive(y).
If Type(x) is Object and Type(y) is either String, Number, or Symbol, return the result of the comparison ToPrimitive(x) == y.
Return false.

這邊方便的點在於給了網頁操作,在看半天不知道為什麼時,可以看出是哪條規則跟自己理解不同...

各種 Java 的版本

看到這則 tweet,提到 Java 的支援度:

主要是裡面有張圖列出了目前市場上有的選擇,可以當關鍵字來查:

目前看起來如果要 Java 8 只有三個方案,其中有過 TCK 的只有兩個,看起來用 Amazon Corretto 算是個還不錯的選擇?

PHP 數字與字串比較的提案

在「Links: February 2019」這邊看到 PHP 社群的提案,想要改善數字與字串比較的結果:「PHP RFC: Saner string to number comparisons」。

他給了一個經典的範例:

$validValues = ["foo", "bar", "baz"];
$value = 0;
var_dump(in_array($value, $validValues));
// bool(true) WTF???

原因是 in_array()== 而非 ===,所以就噴了... 而提案我看了還是覺得不行啊,看看會怎麼改吧 :o