kasuboski/gemma-4-e4b-gleam-sft

A LoRA adapter fine-tuned on google/gemma-4-e4b-it for Gleam code generation.

This is the adapter-only repo. For a ready-to-run GGUF (Ollama / llama.cpp), see kasuboski/gemma-4-e4b-gleam-sft-GGUF.

Dataset

Trained on a mixed dataset of 13,538 instruction-response pairs:

Source	Count	%	Purpose
gleam-instruct	9,355	69.1%	Gleam code generation (translate, refactor, debug, from-scratch)
ultrachat_200k	3,000	22.2%	General instruction — anti-forgetting
Code-290k-ShareGPT-MarkedLanguage	629	4.6%	Functional code (Scala, Elixir, Clojure, Haskell, etc.) — anti-forgetting
OpenCodeInstruct	554	4.1%	Python code — anti-forgetting

gleam-instruct (69.1%)

The core SFT dataset. 14,053 Gleam instruction-response pairs covering:

translate — port code from other languages to Gleam
refactor — improve existing Gleam code
debug — fix broken Gleam code
from-scratch — write new Gleam modules from requirements

Filtered to 9,355 pairs that fit within 4096 tokens (93.6% coverage). The explanation task type (4,698 samples that just explained code) was excluded as a length bottleneck with low learning signal.

Anti-forgetting mix (30.9%)

To preserve general capabilities while specializing on Gleam:

General instruction (ultrachat_200k, MIT) — 3,000 random samples to maintain conversational ability
Functional code (Code-290k, Apache-2.0) — 629 samples across Scala (364), Haskell (156), Clojure (71), Elixir (34), F# (23), Lisp (28), Erlang (5), OCaml (2). Keeps the model's functional programming knowledge.
Python code (OpenCodeInstruct, CC-BY-4.0) — 554 samples to maintain general code ability

Usage

With Unsloth (recommended)

from unsloth import FastModel
from unsloth.chat_templates import get_chat_template

model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/gemma-4-e4b-it-unsloth-bnb-4bit",
    max_seq_length=4096,
    load_in_4bit=True,
)
model.load_adapter("kasuboski/gemma-4-e4b-gleam-sft")
tokenizer = get_chat_template(tokenizer, chat_template="gemma-4")

messages = [
    {"role": "system", "content": "You are a helpful Gleam programming assistant."},
    {"role": "user", "content": "Write a Gleam function that reverses a list."},
]
inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

With PEFT + Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("google/gemma-4-e4b-it", torch_dtype="auto")
model = PeftModel.from_pretrained(base, "kasuboski/gemma-4-e4b-gleam-sft")
tokenizer = AutoTokenizer.from_pretrained("kasuboski/gemma-4-e4b-gleam-sft")

Recommended System Prompt

For best results, use this system prompt:

You are a Gleam programming expert. You ONLY write valid Gleam code. Gleam syntax rules: comments use //, function signatures use -> not ::, no where clauses, no do notation, imports use gleam/module style, use pub fn for public functions, pipe operator |> for chaining. Do NOT use Haskell syntax.

Known Issues

Haskell syntax bleed: The model may occasionally output Haskell syntax (:: type signatures, -- comments, Data.* imports). This is caused by 156 Haskell samples in the anti-forgetting mix. A strong system prompt (above) mitigates this. A future training run should exclude Haskell.
Unsloth GGUF export is broken for Gemma 4: Do NOT use model.save_pretrained_gguf(). Use llama.cpp's convert_hf_to_gguf.py instead (see the GGUF repo for details).

License

This model is subject to the Gemma Terms of Use.

Downloads last month: 10

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support