Instructions to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="jayshah5696/gemma4-e2b-humanize-unsloth-merged")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("jayshah5696/gemma4-e2b-humanize-unsloth-merged")
model = AutoModelForImageTextToText.from_pretrained("jayshah5696/gemma4-e2b-humanize-unsloth-merged")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jayshah5696/gemma4-e2b-humanize-unsloth-merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jayshah5696/gemma4-e2b-humanize-unsloth-merged",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/jayshah5696/gemma4-e2b-humanize-unsloth-merged

SGLang

How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jayshah5696/gemma4-e2b-humanize-unsloth-merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jayshah5696/gemma4-e2b-humanize-unsloth-merged",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jayshah5696/gemma4-e2b-humanize-unsloth-merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jayshah5696/gemma4-e2b-humanize-unsloth-merged",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jayshah5696/gemma4-e2b-humanize-unsloth-merged to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jayshah5696/gemma4-e2b-humanize-unsloth-merged to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jayshah5696/gemma4-e2b-humanize-unsloth-merged to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="jayshah5696/gemma4-e2b-humanize-unsloth-merged",
    max_seq_length=2048,
)

Docker Model Runner
How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with Docker Model Runner:
```
docker model run hf.co/jayshah5696/gemma4-e2b-humanize-unsloth-merged
```

Gemma 4 E2B Humanize-RL — merged SFT policy

Merged weights from unsloth/gemma-4-E2B-it plus the Humanize-RL SFT LoRA adapter jayshah5696/gemma4-e2b-humanize-unsloth-lora. Intended use: starting policy for downstream GRPO / DAPO RL training on the humanize-rl rubric.

This artifact has been verified end-to-end against an explicit set of gates. See the Verification report section below.

Quickstart

from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(
    "jayshah5696/gemma4-e2b-humanize-unsloth-merged", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("jayshah5696/gemma4-e2b-humanize-unsloth-merged")

For text-only use, the language model component is loaded transparently; the vision / audio encoders inherited from the base remain in the checkpoint and are skipped by the forward path when no image / audio is provided.

Provenance

base model: unsloth/gemma-4-E2B-it
source LoRA adapter: jayshah5696/gemma4-e2b-humanize-unsloth-lora
merge method: Unsloth FastModel.save_pretrained_merged(save_method="merged_16bit")
license: Apache-2.0 (matches base)

Architecture notes (read this before reporting bugs)

Gemma 4 E2B has num_hidden_layers: 35 and num_kv_shared_layers: 20. Layers 15-34 share KV with earlier layers and by design do not have their own k_proj, v_proj, k_norm, v_norm weights (transformers PR #45328, commit 9f8ddaa). Transformers registers those names in _keys_to_ignore_on_load_unexpected so a correctly saved Gemma 4 checkpoint omits 80 entries on disk:

model.language_model.layers.{15..34}.self_attn.{k_proj,v_proj,k_norm,v_norm}.weight

Some loaders (notably Unsloth's FastVisionModel) emit a noisy MISSING report for those names. Ignore it. The forward pass never reads those slots. A real broken checkpoint would also show non-shared layers (idx 0-14) as MISSING, which would fail downstream inference within one step.

Verification report

Gate	Result
base model loads	PASS
LoRA adapter loads	PASS
direct LoRA generation works (10 prompts)	PASS
merged model reloads from this HF repo	PASS
only shared-KV keys omitted from safetensors (80 expected, 80 omitted, 0 wrong)	PASS
`AutoModelForImageTextToText` `missing_keys`	0
`AutoModelForImageTextToText` `unexpected_keys`	0
`tokenizer_config.eos_token == "<turn	>"` (Unsloth #5386 guard)
greedy parity vs direct LoRA on 10 prompts	9/10 identical

Parity note: Single non-substantive word swap on one prompt; same length, same intent. Attributed to bf16 rounding in the fused merged matmul vs the LoRA add-on path.

Run metadata:

verified on (UTC): 2026-05-25
verifier: src/humanize_rl/training/verify_gemma4_artifacts_modal.py
safetensors key count: 1951

Known limitations

Unsloth save_pretrained_merged is known to regress tokenizer_config.eos_token from <turn|> (id 106) to <eos> (id 1) on some Gemma 4 fine-tunes (unslothai/unsloth#5386). This repo has been audited and the chat eos is preserved. If a future re-merge regresses it, downstream vLLM tool-call paths will fail to stop. Re-run the verifier with --fix-tokenizer --push-fixed-tokenizer.
The MLX adapters in this project (adapters/gemma4_e2b_v04_mlx_*) were trained before mlx-lm#1158 and are not interchangeable with this merged checkpoint.