Facebook 修正錯字的新演算法

先前 Facebook 已經先發表過 fastText 了,在這個月的月初又發表了另外一個演算法 Misspelling Oblivious Embeddings (MOE),是搭著本來的 fastText 而得到的改善:「A new model for word embeddings that are resilient to misspellings」。

Facebook 的說明提到在 user-generated text 的內容上,MOE 的效果比 fastText 好:

We checked the effectiveness of this approach considering different intrinsic and extrinsic tasks, and found that MOE outperforms fastText for user-generated text.

論文發表在 arXiv 上:「Misspelling Oblivious Word Embeddings」。

依照介紹,fastText 的重點在於 semantic loss,而 MOE 則多了 spell correction loss:

The loss function of fastText aims to more closely embed words that occur in the same context. We call this semantic loss. In addition to the semantic loss, MOE also considers an additional supervisedloss that we call spell correction loss. The spell correction loss aims to embed misspellings close to their correct versions by minimizing the weighted sum of semantic loss and spell correction loss.

不過目前 GitHub 上的 facebookresearch/moe 只有放 dataset,沒有 open source 出來讓人直接用,可能得自己刻...

NASA 用雷射光傳送「蒙娜麗莎」圖片到月球上...

NASA 用雷射光將灰階「蒙娜麗莎」傳送到月球軌道上的 LOLA (Lunar Orbiter Laser Altimeter,看起來是台描繪地表用的儀器?):「NASA Beams Mona Lisa to Lunar Reconnaissance Orbiter at the Moon」。

Lunar Orbiter Laser Altimeter

傳輸速度是 300bits/sec (這數字讓人真感動 XD),傳送 152x200 的 4096 灰階圖片。這次測試其中一個目的是了解地球大氣層對光訊號的影響,可以看到左邊的 raw data 與右邊靠 2/3 RS code 修正的結果:

To clean up transmission errors introduced by Earth's atmosphere (left), Goddard scientists applied Reed-Solomon error correction (right), which is commonly used in CDs and DVDs. Typical errors include missing pixels (white) and false signals (black). The white stripe indicates a brief period when transmission was paused.

不過好像沒提到用多大台的雷射打到月球上... (雷射砲?)