Sign in

Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization

By Jeongseok Hyun and others
The vocabulary size in temporal action localization (TAL) is constrained by the scarcity of large-scale annotated datasets. To address this, recent works incorporate powerful pre-trained vision-language models (VLMs), such as CLIP, to perform open-vocabulary TAL (OV-TAL). However, unlike VLMs trained on extensive image/video-text pairs, existing OV-TAL methods still rely on... Show more
July 9, 2024
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization
Click on play to start listening