Oxford-HIPlab 's Collections

Reward Models Inherit Value Biases from Pretraining ICLR2026

Reward models and logprobs for the paper Christian et al., "Reward Models Inherit Value Biases from Pretraining" (ICLR 2026)