Sign in

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

By Sefik Emre Eskimez and others
This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based... Show more
September 12, 2024
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Click on play to start listening