Sign in

Controllable Emphasis with zero data for text-to-speech

By Arnaud Joly and others
We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that... Show more
July 13, 2023
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Controllable Emphasis with zero data for text-to-speech
Click on play to start listening