Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark