Original Unsloth Finetuning Resources:: https://unsloth.ai/docs/models/gemma-4/train

gemma-4-finetune-finetome10k

A merged, instruction-tuned fine-tune of Gemma 4 E4B trained with Unsloth and TRL on a cleaned conversational subset of FineTome-style data.

Model description

This model is a supervised fine-tune of unsloth/gemma-4-e4b-it-unsloth-bnb-4bit for general-purpose text instruction following and assistant-style responses.

Although the underlying architecture supports multimodal prompting, this fine-tuning run was performed on text-only conversational data. Best results are obtained with chat-formatted prompts using the tokenizer chat template.

  • Base model: unsloth/gemma-4-e4b-it-unsloth-bnb-4bit
  • Fine-tuning framework: Unsloth + Hugging Face TRL
  • Training type: Supervised fine-tuning (SFT)
  • Model format: merged fine-tuned checkpoint
  • Primary language: English

Intended use

This model is intended for:

  • general instruction following
  • conversational Q&A
  • writing assistance
  • basic programming help
  • experimentation with Gemma 4 fine-tuning workflows

Training data

This model was fine-tuned on a conversational instruction/response dataset:: 10k rows of Maxime Labonne's FineTome-100k dataset in ShareGPT style(https://huggingface.co/datasets/mlabonne/FineTome-100k)

Gemma-4 renders multi turn conversations like below:

  • <|turn>user
  • Hello<turn|>
  • <|turn>model
  • Hey there!<turn|>

The training data was normalized into alternating user / assistant turns before formatting with the Gemma chat template. Invalid or malformed conversations were filtered during preprocessing.

Data characteristics

  • conversational instruction-response format
  • predominantly English
  • text-only fine-tuning data
  • general-purpose assistant-style examples

Training procedure

The model was fine-tuned using LoRA adapters and later exported as a merged model.

Preprocessing

The source data used ShareGPT-style fields such as:

  • from: human
  • from: gpt

These were converted into standard chat roles:

  • user
  • assistant

Conversations were cleaned to ensure valid alternating turns before being serialized with the Gemma chat template.

Training configuration

  • Training objective: supervised fine-tuning
  • Epochs: 1
  • Training examples: 10,000
  • Per-device batch size: 4
  • Gradient accumulation steps: 4
  • Effective batch size: 16
  • Warmup steps: 250
  • Learning rate: 5e-5
  • Optimizer: AdamW 8-bit
  • Scheduler: cosine
  • Precision: bf16
  • Hardware: Google Colab G4 GPU

Prompting / usage notes

This model performs best when prompted in chat format, not as a plain raw completion prompt.

Recommended usage

Use tokenizer.apply_chat_template(..., add_generation_prompt=True) before generation.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "<username>/gemma-4-finetune-finetome10k"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Explain what a modulus operator is in programming."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
3
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support