Instructions to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning") model = AutoModelForMultimodalLM.from_pretrained("cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning
- SGLang
How to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning with Docker Model Runner:
docker model run hf.co/cyzero-kim/gemma-4-31B-Opus-4.6-Reasoning
Gemma-4-31B_Opus-Reasoning_BF16
vLLM multimodal has been verified.
from : cloudbjorn/gemma-4-31B-Opus-4.6-Reasoning
This is a fine-tuned and merged version of the Gemma-4 31B model, trained on the high-quality reasoning dataset Crownelius/Opus-4.6-Reasoning-3300x.
The primary goal of this project was to leverage Gemma-4's native <|channel> architecture to enforce strict, logical step-by-step reasoning before outputting a final answer. By fusing the Gemma-4 foundation with the Opus reasoning dataset, the model acts as a deeply analytical agent capable of planning complex cloud deployments and logical deductions.
Reasoning Format: Gemma 4 Architecture
This model adheres strictly to the Gemma 4 multimodal and reasoning formats. It outputs internal reasoning within <|channel>thought bounds before delivering the final response.
Below are raw, unedited samples from this merge:
User: I am building an automated AI training platform on AWS. I need to generate scalable Infrastructure-as-Code (IaC) templates to dynamically spin up GPU instances. Walk me through how I should decide between using standard Terraform versus the AWS Cloud Development Kit (CDK). Give me the pros and cons, and make a final recommendation assuming my platform logic is written in Python.
Model:
<|channel>thoughtLet me work through this step by step.Step 1: Understand the Requirements
- The goal is to create Infrastructure-as-Code for an automated AI training platform... (Context truncation for brevity)
- The platform logic is written in Python, so the IaC solution should integrate smoothly with that codebase. (Evaluates Terraform vs CDK)
Final Recommendation:
Since the platform is built on Python, and there is no immediate need for multi-cloud support, AWS CDK is the best choice...
<channel|>Final Answer: For your automated AI training platform on AWS, I recommend using AWS CDK instead of Terraform. Here's why...
Benchmarks: ARC Challenge
While standard knowledge benchmarks occasionally show minor regression during strict reasoning SFT, the structural output improvements are massive.
- Base (Gemma-4-31B):
acc_norm: 69.88% - Merged (Opus Reasoning):
acc_norm: 69.54%
Training Details
- Base Model:
google/gemma-4-31B - Dataset:
Crownelius/Opus-4.6-Reasoning-3300x - Training Framework: Eschaton Engine (Cloudbjorn)
- Format: Merged (Base + LoRA)
Training Precision:
- Compute Dtype: bfloat16
LoRA Parameters (Auto-Scaled for 31B via Eschaton):
- r: 16
- lora_alpha: 32
- target_modules:
all-linear
Hyperparameters:
- Optimizer: 8-bit Paged AdamW
- Effective Batch Size: 32 (Gradient Accumulation)
- Learning Rate: 2e-5
- LR Scheduler: Linear
- Epochs: 1
- Training Sequence Length: 2048
- Warmup Steps: 50
- Weight Decay: 0.01
- Downloads last month
- 4