Home » Posts tagged "recognition"

Mozilla 實做百度發表的 Speech-To-Text 引擎 Deep Speech

Hacker News 上看到 MozillaGitHub 上的 mozilla/DeepSpeech 這個專案,用 TensorFlow 實做了百度的「Deep Speech: Scaling up end-to-end speech recognition」論文:

A TensorFlow implementation of Baidu's DeepSpeech architecture

語音轉文字的方案,Mozilla 開專案實做出來了...

這程式碼需要安裝 Git Large File Storage 才能完整下載包含訓練資料的部份:

Manually install Git Large File Storage, then clone the repository normally:
git clone https://github.com/mozilla/DeepSpeech

而目前已經有的資料來自於 Mozilla 另外一個專案「Common Voice」:

The Common Voice project is Mozilla's initiative to help teach machines how real people speak.

Common Voice 這個專案目前只有英文,網頁上就可以參與 validation 過程...


作者用 OpenCV 學習老闆的臉,然後當老闆走過來的時候把畫面切到努力工作中的 screenshot XDDD:「Deep Learning Enables You to Hide Screen when Your Boss is Approaching」。

“My boss left his seat and he was approaching to my seat.”

“OpenCV has detected the face and input the image into the learned model.”

“The screen has switched by recognizing him! ヽ(‘ ∇‘ )ノ ワーイ”

作者是個日本人 (要說不意外嗎 XDDD),這套軟體的程式碼在「Hironsan/BossSensor」這邊 XDDD

超級浪費才能 XDDD

Facebook 的臉部辨認系統將被搬上法院,確認是否違法

在「Lawsuit challenging Facebook’s facial recognition system moves forward」這邊提到了 Facebook 的臉部辨認系統將被搬上法院確認是否違法。

Facebook 首先提出 Terms of Service 內規定不受依利諾州法律管理,只受加州以及聯邦法律管理,而依利諾州推翻這項規範,認為原告有權在依利諾州提起訴訟:

Today's decision focused on the question of whether the Illinois law is applicable to Facebook, one of the major legal hurdles facing the plaintiffs. Facebook's Terms of Service maintains that the the company is only bound by California and federal laws, and the company had moved to dismiss the case on those grounds. But today, the judge ruled that the terms-of-service clause isn't sufficient to nullify the Illinois law. As a result, the plaintiffs have a valid claim under the Illinois biometrics law and the case can proceed.


圖片上的文字辨識:Project Naptha

把圖片上的文字辨識直接做成 Google Chrome 的延伸套件,預設就辨識好後讓你可以直接選取:「Project Naptha」。


一張含有文字的圖片可以直接 OCR 出來變成文字選擇。

官方網站上有說,這是 client-side javascript:

One of the more impressive things about this project is the fact that it's almost entirely written in client side javascript. That means that it's pretty much totally functional without access to a remote server.


By default, when you begin selecting text, it sends a secure HTTPS request which lacks any kind of identifiable information to the Project Naptha cached remote OCR and Translation service. This allows you to recognize text from an image with much more accuracy than otherwise possible. However, this can be disabled simply by checking the "Disable Lookup" item under the Options menu.