Sign in

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

By Tianhao Wu and others
Large Language Models (LLMs) are rapidly surpassing human knowledge in many domains. While improving these models traditionally relies on costly human data, recent self-rewarding mechanisms (Yuan et al., 2024) have shown that LLMs can improve by judging their own responses instead of relying on human labelers. However, existing methods have... Show more
July 30, 2024
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Click on play to start listening