gpt-oss-180b-goomba

gpt-oss-180b-goomba is an agentic coding model derived from GPT-OSS 120B.

Goomba expands the GPT-OSS 120B base with additional specialist MoE capacity and is intended for agentic coding, repository work, SWE-style tasks, and tool-using automation.

Goomba is the first release in this line to feature a new post-training data formulation. It is completely different from the previous releases and is much stronger at tool calling, raw SWE-style coding, and math-assisted reasoning.

This model was trained on just two GPUs.

Overview

  • Base model: openai/gpt-oss-120b
  • Approx total parameters: 181B
  • Approx active parameters: 16.5B per token at top-k=16
  • Total expert rows: 200
  • Added specialist experts: 72
  • Format: MXFP4
  • Out-of-box active experts: top-k=16
  • Intended use: agentic coding, SWE-style workflows, repository exploration, tool-using automation, raw SWE coding, math-assisted coding
  • Status: research preview

Recommended vLLM

This model was primarily tested with vLLM using the GPT-OSS reasoning parser and OpenAI tool-call parser.

vllm serve /path/to/model \
  --served-model-name vllm/doobee \
  --tensor-parallel-size 2 \
  --max-model-len 60000 \
  --gpu-memory-utilization 0.88 \
  --enforce-eager \
  --trust-remote-code \
  --reasoning-parser openai_gptoss \
  --tool-call-parser openai \
  --enable-auto-tool-choice

Recommended parameters:

  • num_experts_per_tok=16 is already set in config.json
  • tensor-parallel-size=2
  • max-model-len=60000
  • gpu-memory-utilization=0.88
  • reasoning-parser=openai_gptoss
  • tool-call-parser=openai
  • enable-auto-tool-choice

The config ships with both num_experts_per_tok=16 and experts_per_token=16, so runtimes that respect the model config should use top-k 16 automatically. If your runtime overrides or ignores those fields, pass this explicitly:

--hf-overrides '{"num_experts_per_tok": 16}'

Tool Calling

Goomba was primarily tested as an agentic coding model. Basic OpenAI-compatible tool calling is expected to work best with the vLLM GPT-OSS reasoning parser and OpenAI tool-call parser enabled.

Suggested temperatures:

  • 0.3 for steady coding-agent work
  • 0.5 for broader agentic exploration

Recommended range: 0.3-0.5.

For repository exploration tasks, use an agent prompt that asks the model to inspect subdirectories, identify entry points, and summarize the project structure rather than stopping after a single directory listing.

License

Replace the placeholder license: other metadata with the actual license you want to publish under after confirming compatibility with the base model and your added weights.

Downloads last month
8
Safetensors
Model size
187B params
Tensor type
BF16
·
U8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLMWildling/gpt-oss-180b-goomba

Quantized
(107)
this model