DIR Llama-3.1-8B Reward Model

This repository contains the debiased reward model checkpoint used in the DIR reproduction workspace.

  • Base model: Meta-Llama-3.1-8B-Instruct
  • Training data: Skywork-Reward-Preference-80K-v0.2
  • DIR checkpoint: checkpoint-601
  • RM-Bench during training: eval_RMBench_total ~= 0.68493
  • Original server path: /data03/shibingkang/DIR/reward_models/my_outputs/Meta-Llama-3.1-8B-Instruct_DB_Difference-1_from-SK-v0.2_Debias-Tasklength_difference-1.0_len4096_fulltrain_2e-06_dataSkywork-Reward-Preference-80K-v0.2/checkpoint-601

The upload intentionally excludes DeepSpeed optimizer/model-state recovery files under global_step601/. Those files are very large and are not needed for loading the reward model for evaluation or PPO reward scoring.

Associated code and reproduction notes: https://github.com/BingkangShi/DIR-reproduction

Downloads last month
19
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SilverStRock/DIR-Llama3.1-8B-RM

Finetuned
(2773)
this model