How Do Vision-Language Models Process Conflicting Information Across Modalities?