Mozilla 推出在本地端直接翻譯的 Firefox Translations

Mozilla 推出了「Firefox Translations」這個 Firefox 上的套件。

主打的就是 offline 這件事情,保有隱私性:

Firefox Translations provides automated translation of web content. Unlike cloud-based alternatives, translation is done locally, on the client-side, so that the text being translated does not leave your machine.

Hacker News 上的討論「Firefox Translations: Translate websites in your browser without using the cloud (addons.mozilla.org)」可以看到有些人有提到效果,雖然沒有像雲端服務的準確,但算是可用:

I've just installed it, and I'm impressed so far. I've only run it against some sample German Wikipedia articles (https://de.wikipedia.org/wiki/Clan_of_Xymox), but it produces surprisingly readable text. I also particularly like the "highlight potential errors" option to flag stuff that even the translation service thinks might be a bit off.

It's not nearly as speedy as Google Translate, but I'll take that happily if it means keeping it local.

從頁面上列出的支援語言可以看出還是以歐美用到的語系為主,然後下方也有說明這個專案是包括其他計畫的贊助累積出來的:

Firefox Translations was developed with The Bergamot Project Consortium, coordinated by the University of Edinburgh with partners Charles University in Prague, the University of Sheffield, University of Tartu, and Mozilla. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825303.

不過比較好奇的是在頁面上有提到 CPU 需要 SSE4.1 能力... 這樣就有兩個問題了,第一個是 browser extension 可以直接跑 SSE4.1 指令集?另外一個疑問就是,AppleARM 架構就無法支援嗎 (應該也有類似的加速指令集),現在是 x86 限定?

A CPU that supports SSE4.1 extensions is required for this addon to function properly. If it doesn't, an error will be displayed when the translation is being started.

就算現在限制很多,看起來還是個很有前途的計畫,也許有機會移植到其他瀏覽器上?

Amazon Transcribe 可以自動偵測語言了

Amazon Transcribe 可以將聲音轉成文字,先前都需要自己指定語言,而這幾天發表新的功能,可以自動偵測語言:「Amazon Transcribe Now Supports Automatic Language Identification」。

不過系統要求最少要有 30 秒的資料,跟人類比起來還是有點差距,但比起之前好用不少:

With a minimum of 30 seconds of audio, Amazon Transcribe can efficiently generate transcripts in the spoken language without wasting time and resources on manual tagging.

沒有額外的費用,主要就是照著本來的價錢在走:

There is no additional charge on top of the existing pricing.

翻了一下價錢,好像可以來測一些東西...

Amazon Transcribe 增加許多客製化功能

Amazon TranscribeAWS 提供的語音辨識服務,最近發表了不少可以客製化的功能:「Amazon Transcribe enhances custom vocabulary with custom pronunciations and display forms」。

一個是可以增加字彙,包括他的發音 (不過得透過 IPA 標,這邊會需要學一些東西):

You can give Amazon Transcribe more information about how to process speech in your input audio or video file by creating a custom vocabulary. A custom vocabulary is a list of specific words that you want Amazon Transcribe to recognize in your audio input. These are generally domain-specific words and phrases, words that Amazon Transcribe isn't recognizing, or proper nouns.

Now, with the use of characters from the International Phonetic Alphabet (IPA), you can enhance each custom terminology with corresponding custom pronunciations. Alternatively, you can also use the standard orthography of the language to mimic the way that the word or phrase sounds.

另外是定義詞彙的標示方法:

Additionally, you can now designate exactly how a customer terminology should be displayed when it is transcribed (e.g. “Street” as “St.” versus “ST”).

這對於專有名詞的部份應該是很好用?像是人名...

Word2Vec:透過向量猜測其他詞彙的意思

2013 年時在「Automatic Translation Without Dictionaries」這邊看到關於機器翻譯時的自我學習方式,裡面提到了「How Google Converted Language Translation Into a Problem of Vector Space Mathematics」這篇報導,而裡面提到的論文則是 Google 發表在 arXiv 上的「Exploiting Similarities among Languages for Machine Translation」這篇。

最近看到「The Illustrated Word2vec」這篇,把五年多前的記錄交叉拉出來看... 這個算式算是給了大家基本的想法,透過公式來解釋文字的意義:

拉出這樣的關係後,就有機會學習新的詞彙... 進而用在其他語言的翻譯上。

Tor Browser 繁體中文版

Twitter 上看到 Tor Browser 在宣傳繁體中文版,不過這篇的翻譯有點...:

不過這算是個里程碑啦...

OS X El Capitan (10.11) 軟體授權文件的白話翻譯

有人把 OS X El Capitan 的軟體授權文件翻譯成白話的 22 條英文說明:「OS X El Capitan License: in Plain English」,也就是這份文件:

其實有不少有趣 (?) 的設計...