Nexus-Coder-Alpha

A practical training guide and recipe for building state-of-the-art agentic coding assistants with open-source 8B parameter models.

What This Is

This repository consolidates research from Nemotron-Terminal, Klear-AgentForge, GLM-5, and Qwen3-Coder-Next into a single reproducible training pipeline:

Supervised Fine-Tuning (SFT) on high-quality multi-turn agent trajectories
Reinforcement Learning (RL) with execution-verified rewards
Deployment in Pi agent, Cline, OpenCode, or any OpenAI-compatible coding tool

Target Model

Base: nvidia/Nemotron-Terminal-8B

8.2B parameters, Qwen3 architecture, native tool_calls support
Already pre-trained for terminal/code-agent interaction
Fits on single A100 or A10g-large with LoRA

Key Results (from cited papers)

Benchmark	8B Target	SOTA Reference
SWE-bench Verified	20-40%	Klear-AgentForge: 39.4%
BFCL v3	65-75%	Klear-AgentForge: 71.5%
Terminal-Bench 2.0	15-25%	Nemotron-T-14B: 20.2%
Aider-Polyglot	25-40%	Klear-AgentForge: 33.8%

Documents

TRAINING_GUIDE.md — Full SFT → RL → Deployment recipe with code snippets, dataset links, hyperparameters, and SOTA tricks
train_sft.py — Reference training script for Stage 1 (SFT)
train_grpo.py — Reference training script for Stage 2 (GRPO RL)

Quick Start

# Stage 1: SFT on curated agent trajectories
python train_sft.py \
  --model nvidia/Nemotron-Terminal-8B \
  --dataset mixed_agentic_dataset \
  --output_dir ./nexus-coder-sft

# Stage 2: GRPO with execution-verified rewards
python train_grpo.py \
  --model ./nexus-coder-sft \
  --dataset nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \
  --output_dir ./nexus-coder-rl

Core Datasets

Dataset	Split	Purpose	Link
SWE-bench/SWE-smith-trajectories	`tool` (resolved=True)	SFT: Real repo bug fixing	HF
nvidia/Nemotron-Agentic-v1	`interactive_agent` + `tool_calling`	SFT: Multi-turn tool use	HF
xingyaoww/code-act	`codeact` + `general`	SFT: Executable code actions	HF
nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1	`train`	RL: Step-level pass-rate rewards	HF

Top SOTA Tricks

Multi-format tool templates — Train on 4-5 schemas (OpenAI JSON, XML, Python-style, TypeScript, Qwen3-native) so the model generalizes to any agent framework.
Token-in-Token-Out (TITO) — Use raw token IDs from vLLM rollouts; never re-tokenize for RL loss computation.
Async RL — Decouple vLLM inference engine from training loop for 2-3x throughput.
Format-aware regularization — Penalize malformed tool calls even if the action is logically correct.
60/30/10 data mix — SWE trajectories / general tool-use / code-as-action by token volume.

Benchmarks

SWE-bench Verified — Primary real-world software engineering benchmark
Terminal-Bench 2.0 — Terminal/agent task completion
BFCL v3 — Multi-turn function calling
Aider-Polyglot — Multi-language code editing
tau-bench — Long-horizon multi-turn tool use

Citation

If you use this recipe, please cite the underlying research:

@article{nemotron-terminal-2026,
  title={Nemotron-Terminal: Scalable Training for Terminal-Capable Language Models},
  author={NVIDIA},
  journal={arXiv:2602.21193},
  year={2026}
}
@article{klear-agentforge-2025,
  title={Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling},
  author={Klear-AI},
  journal={arXiv:2511.05951},
  year={2025}
}
@article{glm5-2026,
  title={GLM-5: from Vibe Coding to Agentic Engineering},
  author={Zhipu AI},
  journal={arXiv:2602.15763},
  year={2026}
}

License

The training guide and scripts are provided as-is for research and educational purposes. Dataset and base model licenses apply to their respective owners.