TTC Dense Verifier Qwen2.5-7B Reward Model Adapter

This public repository contains the trained reward-model adapter used as the verifier in the TTC Dense Verifier project.

Intended use

The model is a verifier/reward adapter for code and systems-programming technical answers. It was used to rerank Qwen2.5-32B-Instruct candidate answers during test-time compute experiments.

Base model

Base model: Qwen/Qwen2.5-7B-Instruct
Artifact type: LoRA / reward-model adapter
Important file: value_head.safetensors must be loaded with the adapter for reward scoring.

Experiment summary

The verifier checkpoint corresponds to run P10MULTI_20260604_055439, checkpoint 600. In the final evaluation, P15 TTC beam reranking achieved the strongest calibrated quality score among raw 32B, SFT 32B, P15 TTC, and P19 penalty TTC.

See:

training_config.yaml
verifier_qwen7b.yaml
evaluation_standard.md
p20_four_way_eval_decision.md
model_manifest.json

Limitations

This is not a standalone base model. It requires the Qwen2.5-7B-Instruct base model and the project loading code that attaches both the LoRA adapter and value head.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for m0ss1/ttc-dense-verifier-qwen7b-rm

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2163)

this model