Sign in

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

By Naoyuki Kanda and others
This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR) model that can recognize ``who spoke what'' with low latency even when multiple people are speaking simultaneously. Our model is based on token-level serialized output training (t-SOT) which was recently proposed to transcribe multi-talker speech in a streaming fashion. To... Show more
July 14, 2022
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Click on play to start listening