LLM judge reward model

This Llama 3 reward-model checkpoint is packaged with PyTorchModelHubMixin. Access to the gated meta-llama/Meta-Llama-3-8B-Instruct base model is required.

Usage

Install the dependencies and clone Tankiit/PPO_Learning_to_rank, then run on a CUDA machine:

from llm_judge_listnet_finetune import LLMJudgeRewardModel

model = LLMJudgeRewardModel.from_pretrained(
    "thomasbllx/ppo-ltr-run_approxndcg_seed7",
    map_location="cuda",
)

The evaluation metrics are included for reproducibility.

Downloads last month
19
Safetensors
Model size
4B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including thomasbllx/ppo-ltr-run_approxndcg_seed7