Sign in

Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis

By Aishik Nagar and others
Vision-language models (VLMs) have shown impressive zero- and few-shot performance on real-world visual question answering (VQA) benchmarks, alluding to their capabilities as visual reasoning engines. However, the benchmarks being used conflate "pure" visual reasoning with world knowledge, and also have questions that involve a limited number of reasoning steps. Thus,... Show more
August 27, 2024
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis
Click on play to start listening