Sign in

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

By Ziyi Yang and others
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of... Show more
May 21, 2023
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Click on play to start listening