A Survey on Data Selection for Language Models