olanigan
/

nexus-coder-alpha

Safetensors

qwen2

Model card Files Files and versions

xet

Community

olanigan commited on 13 days ago

Commit

44fda62

verified ·

1 Parent(s): 8ba6caf

Add README with overview, quick start, and dataset links

Browse files

Files changed (1) hide show

README.md +103 -0

README.md ADDED Viewed

	@@ -0,0 +1,103 @@

+# Nexus-Coder-Alpha
+A practical training guide and recipe for building state-of-the-art **agentic coding assistants** with open-source 8B parameter models.
+## What This Is
+This repository consolidates research from **Nemotron-Terminal**, **Klear-AgentForge**, **GLM-5**, and **Qwen3-Coder-Next** into a single reproducible training pipeline:
+1. **Supervised Fine-Tuning (SFT)** on high-quality multi-turn agent trajectories
+2. **Reinforcement Learning (RL)** with execution-verified rewards
+3. **Deployment** in Pi agent, Cline, OpenCode, or any OpenAI-compatible coding tool
+## Target Model
+**Base:** [`nvidia/Nemotron-Terminal-8B`](https://hf.co/nvidia/Nemotron-Terminal-8B)
+- 8.2B parameters, Qwen3 architecture, native `tool_calls` support
+- Already pre-trained for terminal/code-agent interaction
+- Fits on single A100 or A10g-large with LoRA
+## Key Results (from cited papers)
+| Benchmark | 8B Target | SOTA Reference |
+|---|---|---|
+| SWE-bench Verified | 20-40% | Klear-AgentForge: **39.4%** |
+| BFCL v3 | 65-75% | Klear-AgentForge: **71.5%** |
+| Terminal-Bench 2.0 | 15-25% | Nemotron-T-14B: **20.2%** |
+| Aider-Polyglot | 25-40% | Klear-AgentForge: **33.8%** |
+## Documents
+- **[TRAINING_GUIDE.md](TRAINING_GUIDE.md)** — Full SFT → RL → Deployment recipe with code snippets, dataset links, hyperparameters, and SOTA tricks
+- **[train_sft.py](train_sft.py)** — Reference training script for Stage 1 (SFT)
+- **[train_grpo.py](train_grpo.py)** — Reference training script for Stage 2 (GRPO RL)
+## Quick Start
+```bash
+# Stage 1: SFT on curated agent trajectories
+python train_sft.py \
+  --model nvidia/Nemotron-Terminal-8B \
+  --dataset mixed_agentic_dataset \
+  --output_dir ./nexus-coder-sft
+# Stage 2: GRPO with execution-verified rewards
+python train_grpo.py \
+  --model ./nexus-coder-sft \
+  --dataset nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \
+  --output_dir ./nexus-coder-rl
+```
+## Core Datasets
+| Dataset | Split | Purpose | Link |
+|---|---|---|---|
+| SWE-bench/SWE-smith-trajectories | `tool` (resolved=True) | SFT: Real repo bug fixing | [HF](https://hf.co/datasets/SWE-bench/SWE-smith-trajectories) |
+| nvidia/Nemotron-Agentic-v1 | `interactive_agent` + `tool_calling` | SFT: Multi-turn tool use | [HF](https://hf.co/datasets/nvidia/Nemotron-Agentic-v1) |
+| xingyaoww/code-act | `codeact` + `general` | SFT: Executable code actions | [HF](https://hf.co/datasets/xingyaoww/code-act) |
+| nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 | `train` | RL: Step-level pass-rate rewards | [HF](https://hf.co/datasets/nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1) |
+## Top SOTA Tricks
+1. **Multi-format tool templates** — Train on 4-5 schemas (OpenAI JSON, XML, Python-style, TypeScript, Qwen3-native) so the model generalizes to any agent framework.
+2. **Token-in-Token-Out (TITO)** — Use raw token IDs from vLLM rollouts; never re-tokenize for RL loss computation.
+3. **Async RL** — Decouple vLLM inference engine from training loop for 2-3x throughput.
+4. **Format-aware regularization** — Penalize malformed tool calls even if the action is logically correct.
+5. **60/30/10 data mix** — SWE trajectories / general tool-use / code-as-action by token volume.
+## Benchmarks
+- **SWE-bench Verified** — Primary real-world software engineering benchmark
+- **Terminal-Bench 2.0** — Terminal/agent task completion
+- **BFCL v3** — Multi-turn function calling
+- **Aider-Polyglot** — Multi-language code editing
+- **tau-bench** — Long-horizon multi-turn tool use
+## Citation
+If you use this recipe, please cite the underlying research:
+```bibtex
+@article{nemotron-terminal-2026,
+  title={Nemotron-Terminal: Scalable Training for Terminal-Capable Language Models},
+  author={NVIDIA},
+  journal={arXiv:2602.21193},
+  year={2026}
+}
+@article{klear-agentforge-2025,
+  title={Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling},
+  author={Klear-AI},
+  journal={arXiv:2511.05951},
+  year={2025}
+}
+@article{glm5-2026,
+  title={GLM-5: from Vibe Coding to Agentic Engineering},
+  author={Zhipu AI},
+  journal={arXiv:2602.15763},
+  year={2026}
+}
+```
+## License
+The training guide and scripts are provided as-is for research and educational purposes. Dataset and base model licenses apply to their respective owners.