gpt-oss-160b-kiwi

gpt-oss-160b-kiwi is an agentic coder version of GPT-OSS 120B.

After a bunch of iterations in a recursive coding-agent harness, this is the end result of the current 160B branch. It expands the 120B base with 48 added specialist experts and is intended to run at 12 active experts per token.

This model was trained on a 2-GPU setup.

This is by far one of the best checkpoints from this project so far. The next 180B line should be better, but Kiwi is the current strong agentic-coder release.

Overview

  • Base model: openai/gpt-oss-120b
  • Total expert rows: 176
  • Added specialist experts: 48
  • Format: MXFP4
  • Recommended active experts: top-k=12
  • Intended use: coding, agentic coding, SWE-style workflows, tool-using automation
  • Status: research preview

Recommended vLLM

This model was tested with vLLM using the GPT-OSS reasoning and OpenAI tool-call parsers.

vllm serve /path/to/model \
  --served-model-name vllm/doobee \
  --tensor-parallel-size 2 \
  --max-model-len 60000 \
  --gpu-memory-utilization 0.88 \
  --enforce-eager \
  --trust-remote-code \
  --reasoning-parser openai_gptoss \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --hf-overrides '{"num_experts_per_tok": 12}'

Recommended parameters:

  • num_experts_per_tok=12
  • tensor-parallel-size=2
  • max-model-len=60000
  • gpu-memory-utilization=0.88
  • reasoning-parser=openai_gptoss
  • tool-call-parser=openai
  • enable-auto-tool-choice

The staged config is already set to num_experts_per_tok=12 and experts_per_token=12. If your runtime ignores those fields, pass the --hf-overrides value explicitly.

Tool Calling

Kiwi was primarily built and tested as an agentic coding model.

Recommended temperatures:

  • 0.0 for deterministic tool use
  • 0.3 for steady coding-agent work
  • 0.6 for more flexible agentic exploration

The recommended serving path is OpenAI-compatible Chat Completions with vLLM's GPT-OSS reasoning parser and OpenAI tool-call parser enabled.

Next

Kiwi is the current strong 160B agentic-coder checkpoint. The upcoming 180B line is expected to push this further.

License

Replace the placeholder license: other metadata with the actual license you want to publish under after confirming compatibility with the base model and your added weights.

Downloads last month
35
Safetensors
Model size
165B params
Tensor type
BF16
·
U8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLMWildling/gpt-oss-160b-kiwi

Quantized
(107)
this model