Probing Evaluation Awareness of Language Models