Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Abstract
Reinforcement learning from human feedback (RLHF) has been widely adopted to align language models (LMs) with human preference. Prior RLHF works typically take a bandit formulation, which, though intuitive, ignores the sequential nature of LM generation and can suffer from the sparse reward issue. While recent works propose dense token-level RLHF, treating each token as an action may be oversubtle to proper reward assignment. In this paper, we seek to get the best of both by training and utilizing a segment-level reward model, which assigns a reward to each semantically complete text segment that spans over a short sequence of tokens. For reward learning, our method allows dynamic text segmentation and compatibility with standard sequence-preference datasets. For effective RL-based LM training against segment reward, we generalize the classical scalar bandit reward normalizers into location-aware normalizer functions and interpolate the segment reward for further densification. With these designs, our method performs competitively on three popular RLHF benchmarks for LM policy: AlpacaEval 2.0, Arena-Hard, and MT-Bench. Ablation studies are conducted to further demonstrate our method.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment (2024)
- Self-Generated Critiques Boost Reward Modeling for Language Models (2024)
- T-REG: Preference Optimization with Token-Level Reward Regularization (2024)
- Reinforcement Learning Enhanced LLMs: A Survey (2024)
- Multimodal Preference Data Synthetic Alignment with Reward Model (2024)
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment (2024)
- Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 17
Browse 17 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper