PPO Learning to Rank
Collection
Reward models and five-level graded explanation-ranking datasets for PPO Learning to Rank experiments. • 20 items • Updated
This Llama 3 reward-model checkpoint is packaged with
PyTorchModelHubMixin. Access to the gated
meta-llama/Meta-Llama-3-8B-Instruct base model is required.
Install the dependencies and clone Tankiit/PPO_Learning_to_rank, then run on a CUDA machine:
from llm_judge_listnet_finetune import LLMJudgeRewardModel
model = LLMJudgeRewardModel.from_pretrained(
"thomasbllx/ppo-ltr-run_ranknet_seed7",
map_location="cuda",
)
The evaluation metrics are included for reproducibility.