Instructions to use xCloudinfo/gpt-oss-20b-Code-xCloud with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use xCloudinfo/gpt-oss-20b-Code-xCloud with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="xCloudinfo/gpt-oss-20b-Code-xCloud")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("xCloudinfo/gpt-oss-20b-Code-xCloud")
model = AutoModelForCausalLM.from_pretrained("xCloudinfo/gpt-oss-20b-Code-xCloud")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use xCloudinfo/gpt-oss-20b-Code-xCloud with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "xCloudinfo/gpt-oss-20b-Code-xCloud"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xCloudinfo/gpt-oss-20b-Code-xCloud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/xCloudinfo/gpt-oss-20b-Code-xCloud

SGLang

How to use xCloudinfo/gpt-oss-20b-Code-xCloud with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "xCloudinfo/gpt-oss-20b-Code-xCloud" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xCloudinfo/gpt-oss-20b-Code-xCloud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "xCloudinfo/gpt-oss-20b-Code-xCloud" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xCloudinfo/gpt-oss-20b-Code-xCloud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use xCloudinfo/gpt-oss-20b-Code-xCloud with Docker Model Runner:
```
docker model run hf.co/xCloudinfo/gpt-oss-20b-Code-xCloud
```

gpt-oss-20b-Code-xCloud

云碩科技 · xCloudinfo　·　系列：程式 · Code

以 openai/gpt-oss-20b（21B 總參 / 3.6B 活躍 / MoE / MXFP4 / harmony 推理格式）為基底的程式碼能力強化 reasoning 模型。以執行驗證蒸餾的程式碼指令資料做 LoRA 微調（LoRA 作用於 attention，MoE 專家維持原生 MXFP4），保留 gpt-oss 原生 reasoning 能力。

功能：寫程式——理解需求、產生可執行的 Python／程式碼解法，並保留逐步推理（reasoning）。

厲害在哪（據實、不灌水）

訓練料每一筆都經「執行驗證」：解法在沙箱跑過隱藏測試、通過才收（rejection sampling）——不是網路爬來的程式碼，每一筆都證明會動。這是多數蒸餾模型給不出的資料品質保證。
保留通用 coding 實力：HumanEval pass@1 **87.2%**（第三方題庫、greedy、164 題）——對 21B 總參 / 3.6B 活躍的開源模型屬頂規檔次。
整個家族的「碼力底層」：TAIDE-zhTW（繁中）與 Uncensored（無審查）兩顆都疊在這顆之上。
輕量好部署：MXFP4、約 14GB，單張中階 GPU（甚至 CPU）即可本地、可控、離線部署，Apache-2.0 商用友善。

定位是「可控、可驗證、自架的實用 coder」，不是去刷贏前沿封閉模型；價值在資料每筆可執行、行為可控、貼合自家技術堆疊。

做法

資料：程式碼指令資料每筆解法都先在沙箱跑過隱藏測試、通過才收。
方法：teacher 蒸餾 + 執行驗證閘門（rejection sampling）→ LoRA SFT。

用法（transformers）

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("xCloudinfo/gpt-oss-20b-Code-xCloud")
model = AutoModelForCausalLM.from_pretrained("xCloudinfo/gpt-oss-20b-Code-xCloud", dtype="auto", device_map="auto")
msgs = [{"role": "user", "content": "用 Python 寫一個 LRU cache，附簡短說明。"}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
print(tok.decode(model.generate(ids, max_new_tokens=512)[0][ids.shape[1]:], skip_special_tokens=False))

reasoning 模型：請給足 max_new_tokens。GGUF 版見 gpt-oss-20b-Code-xCloud-GGUF。

授權與來源聲明

基底：openai/gpt-oss-20b，Apache-2.0。

由云碩科技 xCloudinfo 於自有 AI 算力資源池製作；資料留在本地、流程可重現。

Downloads last month: 14

Safetensors

Model size

22B params

Tensor type

F32

BF16

Model tree for xCloudinfo/gpt-oss-20b-Code-xCloud

Base model

openai/gpt-oss-20b

Quantized

(213)

this model

Collection including xCloudinfo/gpt-oss-20b-Code-xCloud

OpenAI gpt-oss · 云碩繁中蒸餾

Collection

基於 OpenAI gpt-oss（120B／20B MoE）的云碩蒸餾模型：繁體中文在地化（TAIDE）與程式碼能力強化，含 Transformers 與 GGUF。 • 10 items • Updated 5 days ago