TTC Dense Verifier Qwen2.5-7B Reward Model Adapter
This public repository contains the trained reward-model adapter used as the verifier in the TTC Dense Verifier project.
Intended use
The model is a verifier/reward adapter for code and systems-programming technical answers. It was used to rerank Qwen2.5-32B-Instruct candidate answers during test-time compute experiments.
Base model
- Base model:
Qwen/Qwen2.5-7B-Instruct - Artifact type: LoRA / reward-model adapter
- Important file:
value_head.safetensorsmust be loaded with the adapter for reward scoring.
Experiment summary
The verifier checkpoint corresponds to run P10MULTI_20260604_055439, checkpoint 600. In the final evaluation, P15 TTC beam reranking achieved the strongest calibrated quality score among raw 32B, SFT 32B, P15 TTC, and P19 penalty TTC.
See:
training_config.yamlverifier_qwen7b.yamlevaluation_standard.mdp20_four_way_eval_decision.mdmodel_manifest.json
Limitations
This is not a standalone base model. It requires the Qwen2.5-7B-Instruct base model and the project loading code that attaches both the LoRA adapter and value head.