Instructions to use LLMWildling/gpt-oss-160b-kiwi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLMWildling/gpt-oss-160b-kiwi with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLMWildling/gpt-oss-160b-kiwi") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLMWildling/gpt-oss-160b-kiwi") model = AutoModelForCausalLM.from_pretrained("LLMWildling/gpt-oss-160b-kiwi") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LLMWildling/gpt-oss-160b-kiwi with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLMWildling/gpt-oss-160b-kiwi" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLMWildling/gpt-oss-160b-kiwi", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LLMWildling/gpt-oss-160b-kiwi
- SGLang
How to use LLMWildling/gpt-oss-160b-kiwi with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLMWildling/gpt-oss-160b-kiwi" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLMWildling/gpt-oss-160b-kiwi", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLMWildling/gpt-oss-160b-kiwi" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLMWildling/gpt-oss-160b-kiwi", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use LLMWildling/gpt-oss-160b-kiwi with Docker Model Runner:
docker model run hf.co/LLMWildling/gpt-oss-160b-kiwi
gpt-oss-160b-kiwi
gpt-oss-160b-kiwi is an agentic coder version of GPT-OSS 120B.
After a bunch of iterations in a recursive coding-agent harness, this is the end result of the current 160B branch. It expands the 120B base with 48 added specialist experts and is intended to run at 12 active experts per token.
This model was trained on a 2-GPU setup.
This is by far one of the best checkpoints from this project so far. The next 180B line should be better, but Kiwi is the current strong agentic-coder release.
Overview
- Base model:
openai/gpt-oss-120b - Total expert rows:
176 - Added specialist experts:
48 - Format:
MXFP4 - Recommended active experts:
top-k=12 - Intended use: coding, agentic coding, SWE-style workflows, tool-using automation
- Status: research preview
Recommended vLLM
This model was tested with vLLM using the GPT-OSS reasoning and OpenAI tool-call parsers.
vllm serve /path/to/model \
--served-model-name vllm/doobee \
--tensor-parallel-size 2 \
--max-model-len 60000 \
--gpu-memory-utilization 0.88 \
--enforce-eager \
--trust-remote-code \
--reasoning-parser openai_gptoss \
--tool-call-parser openai \
--enable-auto-tool-choice \
--hf-overrides '{"num_experts_per_tok": 12}'
Recommended parameters:
num_experts_per_tok=12tensor-parallel-size=2max-model-len=60000gpu-memory-utilization=0.88reasoning-parser=openai_gptosstool-call-parser=openaienable-auto-tool-choice
The staged config is already set to num_experts_per_tok=12 and experts_per_token=12. If your runtime ignores those fields, pass the --hf-overrides value explicitly.
Tool Calling
Kiwi was primarily built and tested as an agentic coding model.
Recommended temperatures:
0.0for deterministic tool use0.3for steady coding-agent work0.6for more flexible agentic exploration
The recommended serving path is OpenAI-compatible Chat Completions with vLLM's GPT-OSS reasoning parser and OpenAI tool-call parser enabled.
Next
Kiwi is the current strong 160B agentic-coder checkpoint. The upcoming 180B line is expected to push this further.
License
Replace the placeholder license: other metadata with the actual license you want to publish under after confirming compatibility with the base model and your added weights.
- Downloads last month
- 35
Model tree for LLMWildling/gpt-oss-160b-kiwi
Base model
openai/gpt-oss-120b