metadata
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
library_name: peft
Model Card for Reviewer-7B
Model Details
Model Description
Reviewer-7B is a fine-tuned on DeepSeek-R1-Distill-Qwen-7B, optimized for selecting the best patch among multiple patches generated by our DARS agent while solving software engineering problems.
Model Sources
- Repository: DARS-7B Repository
- Paper: "DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal"
How to Get Started with the Model
We use vLLM to deploy and infer the model. Please follow this tutorial here to use our LoRA weights with vLLM.
Training Details
Dataset
We use our code review dataset where each instance contains several git patches with critiques for each each patch. The model learns to generate critiques for multiple patches and select the best patch.
Training Procedure
Hyperparameter | Value |
---|---|
Training regime | BF16 mixed precision |
Optimizer | AdamW with cosine learning rate scheduler |
LoRA Configuration | rank=8, alpha=32, dropout=0.1 |
Batch Size | 48 |
Learning Rate | 1e-5 |
Sequence Length | 14K tokens |
Fine-tuning Epochs | 1 |
Compute Environment | DeepSpeed for memory-efficient distributed training |
Compute Infrastructure | 8x H100 |
We use training script provided in Qwen-2.5 codebase.
Results
Using this model as a reviewer with DARS trajectories generated using Claude 3.5 Sonnet V2 achieves 38.7% on SWE-Bench Lite.