Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning