We introduce VidLPRO, a novel video-language (VL) pre-training framework designed specifically for robotic and laparoscopic surgery. While existing surgical VL models primarily rely on contrastive learning, we propose a more comprehensive approach to capture the intricate temporal dynamics and align video with language. VidLPRO integrates video-text contrastive learning, video-text matching,... Show more