Instructions to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="jayshah5696/gemma4-e2b-humanize-unsloth-merged") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("jayshah5696/gemma4-e2b-humanize-unsloth-merged") model = AutoModelForImageTextToText.from_pretrained("jayshah5696/gemma4-e2b-humanize-unsloth-merged") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jayshah5696/gemma4-e2b-humanize-unsloth-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jayshah5696/gemma4-e2b-humanize-unsloth-merged", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/jayshah5696/gemma4-e2b-humanize-unsloth-merged
- SGLang
How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jayshah5696/gemma4-e2b-humanize-unsloth-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jayshah5696/gemma4-e2b-humanize-unsloth-merged", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jayshah5696/gemma4-e2b-humanize-unsloth-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jayshah5696/gemma4-e2b-humanize-unsloth-merged", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jayshah5696/gemma4-e2b-humanize-unsloth-merged to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jayshah5696/gemma4-e2b-humanize-unsloth-merged to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jayshah5696/gemma4-e2b-humanize-unsloth-merged to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="jayshah5696/gemma4-e2b-humanize-unsloth-merged", max_seq_length=2048, ) - Docker Model Runner
How to use jayshah5696/gemma4-e2b-humanize-unsloth-merged with Docker Model Runner:
docker model run hf.co/jayshah5696/gemma4-e2b-humanize-unsloth-merged
Gemma 4 E2B Humanize-RL — merged SFT policy
Merged weights from unsloth/gemma-4-E2B-it plus the Humanize-RL SFT LoRA adapter
jayshah5696/gemma4-e2b-humanize-unsloth-lora. Intended use: starting
policy for downstream GRPO / DAPO RL training on the humanize-rl rubric.
This artifact has been verified end-to-end against an explicit set of gates. See the Verification report section below.
Quickstart
from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained(
"jayshah5696/gemma4-e2b-humanize-unsloth-merged", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("jayshah5696/gemma4-e2b-humanize-unsloth-merged")
For text-only use, the language model component is loaded transparently; the vision / audio encoders inherited from the base remain in the checkpoint and are skipped by the forward path when no image / audio is provided.
Provenance
- base model:
unsloth/gemma-4-E2B-it - source LoRA adapter:
jayshah5696/gemma4-e2b-humanize-unsloth-lora - merge method: Unsloth
FastModel.save_pretrained_merged(save_method="merged_16bit") - license: Apache-2.0 (matches base)
Architecture notes (read this before reporting bugs)
Gemma 4 E2B has num_hidden_layers: 35 and num_kv_shared_layers: 20.
Layers 15-34 share KV with earlier layers and by design do not have
their own k_proj, v_proj, k_norm, v_norm weights
(transformers PR #45328,
commit 9f8ddaa). Transformers registers those names in
_keys_to_ignore_on_load_unexpected so a correctly saved Gemma 4
checkpoint omits 80 entries on disk:
model.language_model.layers.{15..34}.self_attn.{k_proj,v_proj,k_norm,v_norm}.weight
Some loaders (notably Unsloth's FastVisionModel) emit a noisy MISSING
report for those names. Ignore it. The forward pass never reads those
slots. A real broken checkpoint would also show non-shared layers (idx
0-14) as MISSING, which would fail downstream inference within one step.
Verification report
| Gate | Result |
|---|---|
| base model loads | PASS |
| LoRA adapter loads | PASS |
| direct LoRA generation works (10 prompts) | PASS |
| merged model reloads from this HF repo | PASS |
| only shared-KV keys omitted from safetensors (80 expected, 80 omitted, 0 wrong) | PASS |
AutoModelForImageTextToText missing_keys |
0 |
AutoModelForImageTextToText unexpected_keys |
0 |
| `tokenizer_config.eos_token == "<turn | >"` (Unsloth #5386 guard) |
| greedy parity vs direct LoRA on 10 prompts | 9/10 identical |
Parity note: Single non-substantive word swap on one prompt; same length, same intent. Attributed to bf16 rounding in the fused merged matmul vs the LoRA add-on path.
Run metadata:
- verified on (UTC):
2026-05-25 - verifier:
src/humanize_rl/training/verify_gemma4_artifacts_modal.py - safetensors key count:
1951
Known limitations
- Unsloth
save_pretrained_mergedis known to regresstokenizer_config.eos_tokenfrom<turn|>(id 106) to<eos>(id 1) on some Gemma 4 fine-tunes (unslothai/unsloth#5386). This repo has been audited and the chat eos is preserved. If a future re-merge regresses it, downstream vLLM tool-call paths will fail to stop. Re-run the verifier with--fix-tokenizer --push-fixed-tokenizer. - The MLX adapters in this project (
adapters/gemma4_e2b_v04_mlx_*) were trained before mlx-lm#1158 and are not interchangeable with this merged checkpoint.
Citation
If you use this checkpoint, please cite the Gemma 4 technical report and this project's repo.
- Downloads last month
- 515