voice – Page 2 – Gea-Suan Lin's BLOG

分析聲音模擬其他人講話...

這種黑科技愈來愈成熟啦：「Lyrebird - An API to copy the voice of anyone」。

Record 1 minute from someone's voice and Lyrebird can compress her/his voice's DNA into a unique key. Use this key to generate anything with its corresponding voice.

Demo 的地方直接拿這三個人惡搞：(這樣做沒問題嗎 XDDD)

Please note that those are artificial voices and they do not convey the opinions of Donald Trump, Barack Obama and Hillary Clinton.

而且是有能力做到即時轉換：

Our GPU clusters generate 1000 sentences in less than half a second.

用 Google 的 Speech Recognition API 破 Google 的 reCAPTCHA

就是「以子之矛，攻子之盾」的概念，用 Speech Recognition API 破 reCAPTCHA：「ReBreakCaptcha: Breaking Google’s ReCaptcha v2 using.. Google」。

就算 Google 在 reCAPTCHA 的聲音裡面加入 watermark，讓自家的 Speech Recognition API 拒絕分析，還是有其他家的可以用 (像是 Amazon Lex 或是 Bing Speech API)，所以這樣做不是什麼好解法。

Slack 支援 Video Call/Conference

Slack 支援 video call 與 video conference 了：「Slack Calls: Now with 100% more video」。

六月的時候推出 Voice 功能 (Calls come to Slack)，半年後推出 Video 功能，還算不錯的開發速度？

Amazon Polly 與 Amazon Lex：人機介面中的語音處理

AWS 這次推出的這兩個服務剛好成對：「Amazon Polly – Text to Speech in 47 Voices and 24 Languages」、「Amazon Lex – Build Conversational Voice & Text Interfaces」。

Amazon Polly 負責把文字唸出來變成語音，而 Amazon Lex 則是將語音辨識回文字，不過目前都還不支援中文... 但畢竟讓 user interface 這塊變得更親民了，算是基礎建設中服務，讓 startup 專心在產品本身上。

Slack 開始測試語音通話功能

Slack 開始測語音通話功能了：「Making voice calls in Slack」，目前是 beta：

Keep in mind: Calls (beta) is currently voice only and desktop only. Video, screen sharing, and mobile support will come in the future.

包括了 one-to-one (開放給所有的 plan)，以及 group (開放給付費 plan)。

在 troubleshooting 的說明裡有提到技術問題，也可以看出一些東西：

If Slack is having trouble establishing a call connection, check the following settings, or ask your IT admin to do so:

Set your network to allow outbound UDP connections to port 22466.

Make sure your network is allowing incoming traffic from UDP 22466.

功能愈來愈齊了...

Nexmo 的 Voice API 可以吃 WAV 與 MP3 格式了...

剛剛才看到 Nexmo 的公告，Nexmo 的 Voice API 以往只能吃文字，然後程式會發音，但現在則可以吃 WAV 或 MP3 格式的檔案直接播放了：「Nexmo’s Voice API Now Supports .wav and .mp3 Format」。

連 API 的範例都直接給出來，好像很好玩的樣子... XD

然後仔細看文件才發現 Speech 的部份是支援中文的！代碼 zh-cn 的男/女聲發音！