TTC Dense Verifier Qwen2.5-7B Reward Model Adapter

This public repository contains the trained reward-model adapter used as the verifier in the TTC Dense Verifier project.

Intended use

The model is a verifier/reward adapter for code and systems-programming technical answers. It was used to rerank Qwen2.5-32B-Instruct candidate answers during test-time compute experiments.

Base model

  • Base model: Qwen/Qwen2.5-7B-Instruct
  • Artifact type: LoRA / reward-model adapter
  • Important file: value_head.safetensors must be loaded with the adapter for reward scoring.

Experiment summary

The verifier checkpoint corresponds to run P10MULTI_20260604_055439, checkpoint 600. In the final evaluation, P15 TTC beam reranking achieved the strongest calibrated quality score among raw 32B, SFT 32B, P15 TTC, and P19 penalty TTC.

See:

  • training_config.yaml
  • verifier_qwen7b.yaml
  • evaluation_standard.md
  • p20_four_way_eval_decision.md
  • model_manifest.json

Limitations

This is not a standalone base model. It requires the Qwen2.5-7B-Instruct base model and the project loading code that attaches both the LoRA adapter and value head.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for m0ss1/ttc-dense-verifier-qwen7b-rm

Base model

Qwen/Qwen2.5-7B
Adapter
(2163)
this model