Aligning Modalities in Vision Large Language Models via Preference Fine-tuning