Home » Posts tagged "deep"

Mozilla 實做百度發表的 Speech-To-Text 引擎 Deep Speech

Hacker News 上看到 MozillaGitHub 上的 mozilla/DeepSpeech 這個專案,用 TensorFlow 實做了百度的「Deep Speech: Scaling up end-to-end speech recognition」論文:

A TensorFlow implementation of Baidu's DeepSpeech architecture

語音轉文字的方案,Mozilla 開專案實做出來了...

這程式碼需要安裝 Git Large File Storage 才能完整下載包含訓練資料的部份:

Manually install Git Large File Storage, then clone the repository normally:
git clone https://github.com/mozilla/DeepSpeech

而目前已經有的資料來自於 Mozilla 另外一個專案「Common Voice」:

The Common Voice project is Mozilla's initiative to help teach machines how real people speak.

Common Voice 這個專案目前只有英文,網頁上就可以參與 validation 過程...

AWS 提供 Windows 上的 Deep Learning AMI

有一些 Windows 上的東西就可以直接開起來跑了:「Announcing New AWS Deep Learning AMI for Microsoft Windows」。

目前支援 2012 R2 與 2016:

Amazon Web Services now offers an AWS Deep Learning AMI for Microsoft Windows Server 2012 R2 and 2016.

然後 driver 與常用的東西都包進去了:

The AMIs also include popular deep learning frameworks such as Apache MXNet, Caffe and Tensorflow, as well as packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. The AMIs come prepackaged with Nvidia CUDA 9, cuDNN 7, and Nvidia 385.54 drivers, and contain the Anaconda platform (supports Python versions 2.7 and 3.5).


作者用 OpenCV 學習老闆的臉,然後當老闆走過來的時候把畫面切到努力工作中的 screenshot XDDD:「Deep Learning Enables You to Hide Screen when Your Boss is Approaching」。

“My boss left his seat and he was approaching to my seat.”

“OpenCV has detected the face and input the image into the learned model.”

“The screen has switched by recognizing him! ヽ(‘ ∇‘ )ノ ワーイ”

作者是個日本人 (要說不意外嗎 XDDD),這套軟體的程式碼在「Hironsan/BossSensor」這邊 XDDD

超級浪費才能 XDDD

Amazon EC2 的 P2 instance

Amazon EC2 為了 GPU 而推出的 P2 type:「New P2 Instance Type for Amazon EC2 – Up to 16 GPUs」。

p2.large 有這樣的規格:

This new instance type incorporates up to 8 NVIDIA Tesla K80 Accelerators, each running a pair of NVIDIA GK210 GPUs. Each GPU provides 12 GB of memory (accessible via 240 GB/second of memory bandwidth), and 2,496 parallel processing cores.

而最大台的 p2.16xlarge 也就是 16 倍... 每小時單價也刷新了之前 x1.32xlarge 的記錄 $13.338/hr (us-east-1),來到了 $14.4/hr...

另外也推出了 deep learning AMI,內裝了一堆常見支援 GPU 的 ML framework:

In order to help you to make great use of one or more P2 instances, we are launching a Deep Learning AMI today.

透過 Deep Learning 辨識人臉馬賽克的技術


  • 符合這些背景身份的四十個人的照片。
  • 人臉被馬賽克後的新聞照片。

現在的問題是,要怎麼判斷出新聞照片裡是哪個人:「Defeating Image Obfuscation with Deep Learning」。

類似這樣的實驗,從 40 個人中找出正確的人,有 50% 的正確率:

也許 50% 不算到能用的程度,但這代表老大哥的技術已經在發展了...


看到「Neural Doodle」這個專案,可以把塗鴉轉成帶有油畫筆觸的圖:


Use a deep neural network to borrow the skills of real artists and turn your two-bit doodles into masterpieces! This project is an implementation of Semantic Style Transfer (Champandard, 2016), based on the Neural Patches algorithm (Li, 2016).


程式可以用純 CPU 跑,也可以用 GPU 跑,不管哪種都很吃記憶體 XDDD