Instructions to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning")

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning")
model = AutoModelForMultimodalLM.from_pretrained("cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning

SGLang

How to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with Docker Model Runner:
```
docker model run hf.co/cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning
```

Gemma-4-31B_Opus-Reasoning_BF16

vLLM multimodal has been verified.

from : cloudbjorn/gemma-4-31B-Opus-4.6-Reasoning

This is a fine-tuned and merged version of the Gemma-4 31B model, trained on the high-quality reasoning dataset Crownelius/Opus-4.6-Reasoning-3300x.

The primary goal of this project was to leverage Gemma-4's native <|channel> architecture to enforce strict, logical step-by-step reasoning before outputting a final answer. By fusing the Gemma-4 foundation with the Opus reasoning dataset, the model acts as a deeply analytical agent capable of planning complex cloud deployments and logical deductions.

Reasoning Format: Gemma 4 Architecture

This model adheres strictly to the Gemma 4 multimodal and reasoning formats. It outputs internal reasoning within <|channel>thought bounds before delivering the final response.

Below are raw, unedited samples from this merge:

User: I am building an automated AI training platform on AWS. I need to generate scalable Infrastructure-as-Code (IaC) templates to dynamically spin up GPU instances. Walk me through how I should decide between using standard Terraform versus the AWS Cloud Development Kit (CDK). Give me the pros and cons, and make a final recommendation assuming my platform logic is written in Python.

Model: <|channel>thought Let me work through this step by step.

Step 1: Understand the Requirements

The goal is to create Infrastructure-as-Code for an automated AI training platform... (Context truncation for brevity)

The platform logic is written in Python, so the IaC solution should integrate smoothly with that codebase. (Evaluates Terraform vs CDK)

Final Recommendation:

Since the platform is built on Python, and there is no immediate need for multi-cloud support, AWS CDK is the best choice...<channel|> Final Answer: For your automated AI training platform on AWS, I recommend using AWS CDK instead of Terraform. Here's why...

Benchmarks: ARC Challenge

While standard knowledge benchmarks occasionally show minor regression during strict reasoning SFT, the structural output improvements are massive.

Base (Gemma-4-31B): acc_norm: 69.88%
Merged (Opus Reasoning): acc_norm: 69.54%

Training Details

Base Model: google/gemma-4-31B
Dataset: Crownelius/Opus-4.6-Reasoning-3300x
Training Framework: Eschaton Engine (Cloudbjorn)
Format: Merged (Base + LoRA)

Training Precision:

Compute Dtype: bfloat16

LoRA Parameters (Auto-Scaled for 31B via Eschaton):

r: 16
lora_alpha: 32
target_modules: all-linear

Hyperparameters:

Optimizer: 8-bit Paged AdamW
Effective Batch Size: 32 (Gradient Accumulation)
Learning Rate: 2e-5
LR Scheduler: Linear
Epochs: 1
Training Sequence Length: 2048
Warmup Steps: 50
Weight Decay: 0.01

Downloads last month: 4

Safetensors

Model size

31B params

Tensor type

BF16

Model tree for cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning

Base model

google/gemma-4-31B

Finetuned

(43)

this model

Finetunes

1 model

cyzero-kim
/

gemma-4-31B-Opus-4.6-Reasoning

Gemma-4-31B_Opus-Reasoning_BF16

Reasoning Format: Gemma 4 Architecture

Final Recommendation:

Benchmarks: ARC Challenge

Training Details

Model tree for cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning

Dataset used to train cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning