Sign in

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

By Xinyu Fang and others
The advent of large vision-language models (LVLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding. Traditional VideoQA benchmarks, despite providing quantitative metrics, often fail to encompass the full spectrum of video content and inadequately assess models' temporal comprehension. To address these limitations, we introduce MMBench-Video,... Show more
October 30, 2024
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Click on play to start listening