olanigan commited on
Commit
44fda62
Β·
verified Β·
1 Parent(s): 8ba6caf

Add README with overview, quick start, and dataset links

Browse files
Files changed (1) hide show
  1. README.md +103 -0
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Nexus-Coder-Alpha
2
+
3
+ A practical training guide and recipe for building state-of-the-art **agentic coding assistants** with open-source 8B parameter models.
4
+
5
+ ## What This Is
6
+
7
+ This repository consolidates research from **Nemotron-Terminal**, **Klear-AgentForge**, **GLM-5**, and **Qwen3-Coder-Next** into a single reproducible training pipeline:
8
+
9
+ 1. **Supervised Fine-Tuning (SFT)** on high-quality multi-turn agent trajectories
10
+ 2. **Reinforcement Learning (RL)** with execution-verified rewards
11
+ 3. **Deployment** in Pi agent, Cline, OpenCode, or any OpenAI-compatible coding tool
12
+
13
+ ## Target Model
14
+
15
+ **Base:** [`nvidia/Nemotron-Terminal-8B`](https://hf.co/nvidia/Nemotron-Terminal-8B)
16
+ - 8.2B parameters, Qwen3 architecture, native `tool_calls` support
17
+ - Already pre-trained for terminal/code-agent interaction
18
+ - Fits on single A100 or A10g-large with LoRA
19
+
20
+ ## Key Results (from cited papers)
21
+
22
+ | Benchmark | 8B Target | SOTA Reference |
23
+ |---|---|---|
24
+ | SWE-bench Verified | 20-40% | Klear-AgentForge: **39.4%** |
25
+ | BFCL v3 | 65-75% | Klear-AgentForge: **71.5%** |
26
+ | Terminal-Bench 2.0 | 15-25% | Nemotron-T-14B: **20.2%** |
27
+ | Aider-Polyglot | 25-40% | Klear-AgentForge: **33.8%** |
28
+
29
+ ## Documents
30
+
31
+ - **[TRAINING_GUIDE.md](TRAINING_GUIDE.md)** β€” Full SFT β†’ RL β†’ Deployment recipe with code snippets, dataset links, hyperparameters, and SOTA tricks
32
+ - **[train_sft.py](train_sft.py)** β€” Reference training script for Stage 1 (SFT)
33
+ - **[train_grpo.py](train_grpo.py)** β€” Reference training script for Stage 2 (GRPO RL)
34
+
35
+ ## Quick Start
36
+
37
+ ```bash
38
+ # Stage 1: SFT on curated agent trajectories
39
+ python train_sft.py \
40
+ --model nvidia/Nemotron-Terminal-8B \
41
+ --dataset mixed_agentic_dataset \
42
+ --output_dir ./nexus-coder-sft
43
+
44
+ # Stage 2: GRPO with execution-verified rewards
45
+ python train_grpo.py \
46
+ --model ./nexus-coder-sft \
47
+ --dataset nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \
48
+ --output_dir ./nexus-coder-rl
49
+ ```
50
+
51
+ ## Core Datasets
52
+
53
+ | Dataset | Split | Purpose | Link |
54
+ |---|---|---|---|
55
+ | SWE-bench/SWE-smith-trajectories | `tool` (resolved=True) | SFT: Real repo bug fixing | [HF](https://hf.co/datasets/SWE-bench/SWE-smith-trajectories) |
56
+ | nvidia/Nemotron-Agentic-v1 | `interactive_agent` + `tool_calling` | SFT: Multi-turn tool use | [HF](https://hf.co/datasets/nvidia/Nemotron-Agentic-v1) |
57
+ | xingyaoww/code-act | `codeact` + `general` | SFT: Executable code actions | [HF](https://hf.co/datasets/xingyaoww/code-act) |
58
+ | nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 | `train` | RL: Step-level pass-rate rewards | [HF](https://hf.co/datasets/nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1) |
59
+
60
+ ## Top SOTA Tricks
61
+
62
+ 1. **Multi-format tool templates** β€” Train on 4-5 schemas (OpenAI JSON, XML, Python-style, TypeScript, Qwen3-native) so the model generalizes to any agent framework.
63
+ 2. **Token-in-Token-Out (TITO)** β€” Use raw token IDs from vLLM rollouts; never re-tokenize for RL loss computation.
64
+ 3. **Async RL** β€” Decouple vLLM inference engine from training loop for 2-3x throughput.
65
+ 4. **Format-aware regularization** β€” Penalize malformed tool calls even if the action is logically correct.
66
+ 5. **60/30/10 data mix** β€” SWE trajectories / general tool-use / code-as-action by token volume.
67
+
68
+ ## Benchmarks
69
+
70
+ - **SWE-bench Verified** β€” Primary real-world software engineering benchmark
71
+ - **Terminal-Bench 2.0** β€” Terminal/agent task completion
72
+ - **BFCL v3** β€” Multi-turn function calling
73
+ - **Aider-Polyglot** β€” Multi-language code editing
74
+ - **tau-bench** β€” Long-horizon multi-turn tool use
75
+
76
+ ## Citation
77
+
78
+ If you use this recipe, please cite the underlying research:
79
+
80
+ ```bibtex
81
+ @article{nemotron-terminal-2026,
82
+ title={Nemotron-Terminal: Scalable Training for Terminal-Capable Language Models},
83
+ author={NVIDIA},
84
+ journal={arXiv:2602.21193},
85
+ year={2026}
86
+ }
87
+ @article{klear-agentforge-2025,
88
+ title={Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling},
89
+ author={Klear-AI},
90
+ journal={arXiv:2511.05951},
91
+ year={2025}
92
+ }
93
+ @article{glm5-2026,
94
+ title={GLM-5: from Vibe Coding to Agentic Engineering},
95
+ author={Zhipu AI},
96
+ journal={arXiv:2602.15763},
97
+ year={2026}
98
+ }
99
+ ```
100
+
101
+ ## License
102
+
103
+ The training guide and scripts are provided as-is for research and educational purposes. Dataset and base model licenses apply to their respective owners.