We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called Vall-E) using discrete codes derived from an off-the-shelf ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results