Gemma 4 E2B Humanize-RL — merged SFT policy

Merged weights from unsloth/gemma-4-E2B-it plus the Humanize-RL SFT LoRA adapter jayshah5696/gemma4-e2b-humanize-unsloth-lora. Intended use: starting policy for downstream GRPO / DAPO RL training on the humanize-rl rubric.

This artifact has been verified end-to-end against an explicit set of gates. See the Verification report section below.

Quickstart

from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(
    "jayshah5696/gemma4-e2b-humanize-unsloth-merged", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("jayshah5696/gemma4-e2b-humanize-unsloth-merged")

For text-only use, the language model component is loaded transparently; the vision / audio encoders inherited from the base remain in the checkpoint and are skipped by the forward path when no image / audio is provided.

Provenance

  • base model: unsloth/gemma-4-E2B-it
  • source LoRA adapter: jayshah5696/gemma4-e2b-humanize-unsloth-lora
  • merge method: Unsloth FastModel.save_pretrained_merged(save_method="merged_16bit")
  • license: Apache-2.0 (matches base)

Architecture notes (read this before reporting bugs)

Gemma 4 E2B has num_hidden_layers: 35 and num_kv_shared_layers: 20. Layers 15-34 share KV with earlier layers and by design do not have their own k_proj, v_proj, k_norm, v_norm weights (transformers PR #45328, commit 9f8ddaa). Transformers registers those names in _keys_to_ignore_on_load_unexpected so a correctly saved Gemma 4 checkpoint omits 80 entries on disk:

model.language_model.layers.{15..34}.self_attn.{k_proj,v_proj,k_norm,v_norm}.weight

Some loaders (notably Unsloth's FastVisionModel) emit a noisy MISSING report for those names. Ignore it. The forward pass never reads those slots. A real broken checkpoint would also show non-shared layers (idx 0-14) as MISSING, which would fail downstream inference within one step.

Verification report

Gate Result
base model loads PASS
LoRA adapter loads PASS
direct LoRA generation works (10 prompts) PASS
merged model reloads from this HF repo PASS
only shared-KV keys omitted from safetensors (80 expected, 80 omitted, 0 wrong) PASS
AutoModelForImageTextToText missing_keys 0
AutoModelForImageTextToText unexpected_keys 0
`tokenizer_config.eos_token == "<turn >"` (Unsloth #5386 guard)
greedy parity vs direct LoRA on 10 prompts 9/10 identical

Parity note: Single non-substantive word swap on one prompt; same length, same intent. Attributed to bf16 rounding in the fused merged matmul vs the LoRA add-on path.

Run metadata:

  • verified on (UTC): 2026-05-25
  • verifier: src/humanize_rl/training/verify_gemma4_artifacts_modal.py
  • safetensors key count: 1951

Known limitations

  • Unsloth save_pretrained_merged is known to regress tokenizer_config.eos_token from <turn|> (id 106) to <eos> (id 1) on some Gemma 4 fine-tunes (unslothai/unsloth#5386). This repo has been audited and the chat eos is preserved. If a future re-merge regresses it, downstream vLLM tool-call paths will fail to stop. Re-run the verifier with --fix-tokenizer --push-fixed-tokenizer.
  • The MLX adapters in this project (adapters/gemma4_e2b_v04_mlx_*) were trained before mlx-lm#1158 and are not interchangeable with this merged checkpoint.

Citation

If you use this checkpoint, please cite the Gemma 4 technical report and this project's repo.

Downloads last month
515
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jayshah5696/gemma4-e2b-humanize-unsloth-merged

Finetuned
(141)
this model
Adapters
1 model