Sign in

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

By Jian Wu and others
Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR). T-SOT effectively handles overlapped speech by representing multi-talker transcriptions as a single token stream with \langle \text{cc}\rangle symbols interspersed. However, the use of a naive neural transducer architecture significantly constrained its... Show more
September 15, 2023
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Click on play to start listening