Sign in

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

By Junkang Wu and others
This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings.... Show more
July 10, 2024
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Click on play to start listening