Tacotron 是 Google 發表的 TTS 技術 (i.e. 輸入文字,請電腦發音),而前一版的 Tacotron 的錄音可以參考「Audio samples from "Tacotron: Towards End-to-End Speech Synthesis"」,論文則是在「Tacotron: Towards End-to-End Speech Synthesis」這邊可以看到。
這一版的則是在 Twitter 上看到有人提到:
Wow! I can no longer distinguish between a computer generated voice and recording of a person. #TTS #generative #DeepLearning
Try the samples then the Turing test: https://t.co/8LNcaCGfLR pic.twitter.com/tp8lofN1As— Alex J. Champandard (@alexjc) December 20, 2017
這一版叫做 Tacotron 2,錄音可以參考「Audio samples from "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"」,論文在「Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions」。
這次在錄音頁面的最下面提供了盲測 (人類與 Tacotron 2 的錄音),基本上已經分不出哪個是真人了...