This is a reward model finetuned on Llemma-34b. To score the steps, pass encoded text = question + solution as input.

rewards = model(text).mean(dim=-1).sigmoid()[index]

Where index is the positions for special end tokens of each step.

Safetensors

Model size

33.7B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tkitsers/Llemma-reward-model

Quantizations

Collection including tkitsers/Llemma-reward-model