HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue