LLM judge reward model

This Llama 3 reward-model checkpoint is packaged with PyTorchModelHubMixin. Access to the gated meta-llama/Meta-Llama-3-8B-Instruct base model is required.

Usage

Install the dependencies and clone Tankiit/PPO_Learning_to_rank, then run on a CUDA machine:

from llm_judge_listnet_finetune import LLMJudgeRewardModel

model = LLMJudgeRewardModel.from_pretrained(
    "thomasbllx/ppo-ltr-run_ranknet_seed7",
    map_location="cuda",
)

The evaluation metrics are included for reproducibility.

Downloads last month: 16

Safetensors

Model size

4B params

Tensor type

F32

BF16

Inference Providers NEW

Text Ranking

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including thomasbllx/ppo-ltr-run_ranknet_seed7

PPO Learning to Rank

Collection

Reward models and five-level graded explanation-ranking datasets for PPO Learning to Rank experiments. • 20 items • Updated 8 days ago