Instructions to use virtuous7373/Gemma-4-Harmonia-31B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use virtuous7373/Gemma-4-Harmonia-31B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="virtuous7373/Gemma-4-Harmonia-31B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("virtuous7373/Gemma-4-Harmonia-31B")
model = AutoModelForImageTextToText.from_pretrained("virtuous7373/Gemma-4-Harmonia-31B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use virtuous7373/Gemma-4-Harmonia-31B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "virtuous7373/Gemma-4-Harmonia-31B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "virtuous7373/Gemma-4-Harmonia-31B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/virtuous7373/Gemma-4-Harmonia-31B

SGLang

How to use virtuous7373/Gemma-4-Harmonia-31B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "virtuous7373/Gemma-4-Harmonia-31B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "virtuous7373/Gemma-4-Harmonia-31B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "virtuous7373/Gemma-4-Harmonia-31B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "virtuous7373/Gemma-4-Harmonia-31B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use virtuous7373/Gemma-4-Harmonia-31B with Docker Model Runner:
```
docker model run hf.co/virtuous7373/Gemma-4-Harmonia-31B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

HARMONIA

The Greek goddess of harmony and concord.

Gemini Word Salad Initialization

Harmonious Synthesis

Harmonia is a high-dimensional 31-billion parameter merge of Gemma 4. By executing a meticulous three-phase fusion of seven elite foundation and specialized models, Harmonia demonstrates a targeted approach to deep neural consolidation, minimizing regression while amplifying unique capability boundaries.

Instead of simple linear blending, which often degrades logical coherence and dilutes nuanced behavior, Harmonia was sculpted using a combination of mathematical projections, covariance activation matching, and surgical synaptic pruning. The model appears pretty solid so far.

Multi-Stage Fusion Protocol

The lineage of Harmonia is constructed systematically, passing through three isolated mathematical states to layer capabilities cleanly.

Phase I

Nullspace Coherence Mapping

To anchor base capabilities, the primary Gemma-4-31B-Base is combined with the analytically rigorous GarnetV2-31B. Utilizing low-rank Singular Value Decomposition (SVD), the specialized donor features are projected entirely onto the mathematical null-space of the base weights. This prevents the creative delta vectors from distorting essential core intelligence, producing the stable platform clever-basename.

> Method: Null-Space Filtering

> Core Integrity Protection (Base Protect): Active (True)

> Targeted Active Rank Limit: 256

Phase II

Surgical Synaptic Gating

Next, our newly anchored base is layered with the highly independent cognitive engines MeroMero-31B and Gembrain-31B. We apply Context-Aware Binary Selection (CABS) to execute structured, localized parameter gating. By enforcing precise structural pruning ratios (retaining optimal synapses in 16:32 and 11:33 ratios), we weave complex creative reasoning directly into the core matrix without causing neural interference. The result is the highly expressive clever-intname.

> Method: Context-Aware Binary Selection (CABS)

> Structural Masking Ratio (MeroMero): 16 : 32 (Weight: 0.6)

> Structural Masking Ratio (Gembrain): 11 : 33 (Weight: 0.4)

> Default Sparse Gating Step: 8 : 32

Phase III

Covariance Activation Matching

In the final harmonization phase, the expressive clever-intname is combined with the narrative mastery of Equinox-31B, the creative depth of Fabled-Gemma4, and our primary conversational core Ortenzya-The-Creative-Wordsmith. Using data-free covariance estimation via task vectors, ACTMat reconstructs layer-wise input activation properties, solving for optimal projection weights in activation space. This resolves semantic alignment anomalies and delivers the unified output model.

> Method: ACTMat Activation Matching

> Task Vector Blending Covariance Limit: 16,384

> Epsilon Solver Regularizer: 1e-06

> Output Precision Profile: bfloat16

Methodological Innovations

Nullspace Projection

Instead of destroying structural logic via linear interpolation, this method extracts the base model's essential singular values. It projects specialized donor features orthogonally, preventing core capability degradation.

Context-Aware Binary Selection

A dynamic, high-fidelity neural filter. Applying structured magnitude masking at customizable N:M fractions removes low-signal synaptic weights, seamlessly layering domain specialization into active logical paths.

Activation Covariance Matching

Using Gram matrices computed directly from task vectors, ACTMat aligns semantic representations in the activation space rather than the parameter space. It dynamically falls back to robust pseudo-inverse SVD solvers when numerical anomalies arise.

Model Lineage & Ingredients

We extend our gratitude to the creators of the ancestral paths that intersect within Harmonia:

Ortenzya-The-Creative-Wordsmith

llmfan46.

Equinox-31B

LatitudeGames.

Merge Blueprint

The entire orchestration sequence is structured via a multi-stage MergeKit pipeline. Expand the block below to view the structural YAML recipes.

Show MergeKit Configuration

name: clever-basename

merge_method: nullspace
base_model: ./gemma-4-31B-base

models:
  - model: ./Gemma4-GarnetV2-31B
    parameters:
      weight: 1.0

parameters:
  protect_base: true
  nr: 256

tokenizer:
  source: base
chat_template: auto

dtype: float32
out_dtype: bfloat16
---
name: clever-intname
merge_method: cabs

base_model: ./clever-basename

models:
  - model: ./clever-basename

  - model: ./G4-MeroMero-31B-uncensored-heretic
    parameters:
      weight: 0.6
      n_val: 16
      m_val: 32
  - model: ./Gemma-4-Gembrain-31B-heretic
    parameters:
      weight: 0.4
      n_val: 11
      m_val: 33

default_n_val: 8
default_m_val: 32

pruning_order:
  - ./G4-MeroMero-31B-uncensored-heretic
  - ./Gemma-4-Gembrain-31B-heretic

dtype: float32
out_dtype: bfloat16

tokenizer:
  source: union

chat_template: auto
---
name: Harmonia

merge_method: actmat

base_model: ./gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic

models:
  - model: ./gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic
  - model: ./LatitudeGames-Equinox-31B
    parameters:
      weight: 1
  - model: ./clever-intname
    parameters:
      weight: 1
  - model: ./Fabled-Gemma4-31B
    parameters:
      weight: 1

parameters:
  epsilon: 1e-6

tokenizer:
  source: "union"

dtype: bfloat16
out_dtype: bfloat16

chat_template: auto

Symphony Contributors

I am grateful to the following individuals for their models, inspiration, and other contributions.:

Lambent ConicCat llmfan46 Arcee AI zerofata p-e-w Naphula Nimbz Latitude Games Blazed-Forge

And of course, every wonderful person on:

LocalLLaMA

A big thanks to Gemini-3.5-flash for creating this README alongside the word salads found within it. A special acknowledgment is extended to Google DeepMind for their contribution of the Gemma-4 foundation family to the open-weight ecosystem, representing the structural cornerstone of this merge and its constituents.