This is a reward model finetuned on Llemma-34b. To score the steps, pass encoded text = question + solution as input.

rewards = model(text).mean(dim=-1).sigmoid()[index]

Where index is the positions for special end tokens of each step.

Downloads last month
7
Safetensors
Model size
33.7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tkitsers/Llemma-reward-model

Quantizations
1 model

Collection including tkitsers/Llemma-reward-model