Mini-Gemma Custom Model

This repository contains a custom domain-specialized fine-tune of the Gemma architecture, optimized for specific text distributions and patterns. The model was trained using the Hugging Face Trainer on an accelerated NVIDIA GPU cluster.

πŸ“Š Training Performance & Metrics

The model successfully converged over its training run with highly stable gradients:

  • Total Training Steps: 20,000
  • Final Total Train Loss: 3.478
  • Final Step Loss: 2.988
  • Gradient Norm Stability: Stable at ~1.12
  • Training Status: Complete / Fully Converged

πŸš€ Quick Start & Usage

You can easily load and run this model locally using the Transformers library:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "agentbyumer/mini-gemma"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = "Your specialized prompt here"
outputs = generator(
    prompt, 
    max_new_tokens=150, 
    do_sample=True, 
    temperature=0.7,
    return_full_text=False
)
print(outputs[0]['generated_text'])

πŸ“œ License

This project is licensed under the permissive MIT License. See the accompanying LICENSE file for full details.

Downloads last month
54
Safetensors
Model size
45.3M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for agentbyumer/mini-gemma

Base model

google/gemma-2b
Finetuned
(297)
this model