Nexus-Coder-Alpha
A practical training guide and recipe for building state-of-the-art agentic coding assistants with open-source 8B parameter models.
What This Is
This repository consolidates research from Nemotron-Terminal, Klear-AgentForge, GLM-5, and Qwen3-Coder-Next into a single reproducible training pipeline:
- Supervised Fine-Tuning (SFT) on high-quality multi-turn agent trajectories
- Reinforcement Learning (RL) with execution-verified rewards
- Deployment in Pi agent, Cline, OpenCode, or any OpenAI-compatible coding tool
Target Model
Base: nvidia/Nemotron-Terminal-8B
- 8.2B parameters, Qwen3 architecture, native
tool_callssupport - Already pre-trained for terminal/code-agent interaction
- Fits on single A100 or A10g-large with LoRA
Key Results (from cited papers)
| Benchmark | 8B Target | SOTA Reference |
|---|---|---|
| SWE-bench Verified | 20-40% | Klear-AgentForge: 39.4% |
| BFCL v3 | 65-75% | Klear-AgentForge: 71.5% |
| Terminal-Bench 2.0 | 15-25% | Nemotron-T-14B: 20.2% |
| Aider-Polyglot | 25-40% | Klear-AgentForge: 33.8% |
Documents
- TRAINING_GUIDE.md β Full SFT β RL β Deployment recipe with code snippets, dataset links, hyperparameters, and SOTA tricks
- train_sft.py β Reference training script for Stage 1 (SFT)
- train_grpo.py β Reference training script for Stage 2 (GRPO RL)
Quick Start
# Stage 1: SFT on curated agent trajectories
python train_sft.py \
--model nvidia/Nemotron-Terminal-8B \
--dataset mixed_agentic_dataset \
--output_dir ./nexus-coder-sft
# Stage 2: GRPO with execution-verified rewards
python train_grpo.py \
--model ./nexus-coder-sft \
--dataset nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \
--output_dir ./nexus-coder-rl
Core Datasets
| Dataset | Split | Purpose | Link |
|---|---|---|---|
| SWE-bench/SWE-smith-trajectories | tool (resolved=True) |
SFT: Real repo bug fixing | HF |
| nvidia/Nemotron-Agentic-v1 | interactive_agent + tool_calling |
SFT: Multi-turn tool use | HF |
| xingyaoww/code-act | codeact + general |
SFT: Executable code actions | HF |
| nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 | train |
RL: Step-level pass-rate rewards | HF |
Top SOTA Tricks
- Multi-format tool templates β Train on 4-5 schemas (OpenAI JSON, XML, Python-style, TypeScript, Qwen3-native) so the model generalizes to any agent framework.
- Token-in-Token-Out (TITO) β Use raw token IDs from vLLM rollouts; never re-tokenize for RL loss computation.
- Async RL β Decouple vLLM inference engine from training loop for 2-3x throughput.
- Format-aware regularization β Penalize malformed tool calls even if the action is logically correct.
- 60/30/10 data mix β SWE trajectories / general tool-use / code-as-action by token volume.
Benchmarks
- SWE-bench Verified β Primary real-world software engineering benchmark
- Terminal-Bench 2.0 β Terminal/agent task completion
- BFCL v3 β Multi-turn function calling
- Aider-Polyglot β Multi-language code editing
- tau-bench β Long-horizon multi-turn tool use
Citation
If you use this recipe, please cite the underlying research:
@article{nemotron-terminal-2026,
title={Nemotron-Terminal: Scalable Training for Terminal-Capable Language Models},
author={NVIDIA},
journal={arXiv:2602.21193},
year={2026}
}
@article{klear-agentforge-2025,
title={Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling},
author={Klear-AI},
journal={arXiv:2511.05951},
year={2025}
}
@article{glm5-2026,
title={GLM-5: from Vibe Coding to Agentic Engineering},
author={Zhipu AI},
journal={arXiv:2602.15763},
year={2026}
}
License
The training guide and scripts are provided as-is for research and educational purposes. Dataset and base model licenses apply to their respective owners.