You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Gemma4GR — Greek Speech & Language Fine-tune (E4B v2)

⚠️ Experimental — Work in Progress
This model was produced for the Google Gemma 4 Good Hackathon (Kaggle, May 2026). Training is ongoing and results are preliminary. Do not use in production.

Code & training pipeline: https://github.com/Efs-O/Gemma4GR


What is this?

Gemma4GR is a fine-tuned version of google/gemma-4-it (4B parameters) trained to improve Greek language understanding in two areas:

  1. Greek Speech-to-Text (STT) — Audio LoRA trained on a mixed corpus of real human voice recordings (3,217 WAVs across 17 categories: talking to children, storytelling, everyday conversation, family, food, school, culture, nature, healthcare, sports, travel, numbers, news, and more) and synthetic Greek speech generated by JOY, a custom Piper TTS voice trained for this project. The model learns to understand spoken Modern Greek and respond naturally in Greek.

  2. Greek Text Q&A — Language LoRA trained on 2,476 curated Greek Q&A pairs covering 10 topic categories (children's education, culture, everyday life, food, geography, history, language, mythology, religion, science). Improves fluency and cultural accuracy of Greek prose generation.

The two adapters are merged sequentially into a single GGUF model, ready to drop into Ollama or llama.cpp.


Model Files

File Format Size Use
gemma-4-e4b-it-gr-v2-Q4_K_M.gguf GGUF Q4_K_M 5.0 GB Primary inference
gemma-4-e4b-it-gr-v2-Q8_0.gguf GGUF Q8_0 7.5 GB Higher precision inference
gemma-4-e4b-it-gr-v2-mmproj.gguf GGUF F16 945 MB Vision/audio projection

Training Details

STT Adapter (Phase 1)

Parameter Value
Base model unsloth/gemma-4-E4B-it
Loader FastVisionModel (Unsloth)
Method QLoRA 4-bit
LoRA r / alpha 32 / 64
Training pairs 2,895 audio Q&A pairs (human voice WAVs + JOY/Piper synthetic)
Epochs 2
Train loss 0.088
Best eval loss 4.876
Hardware NVIDIA RTX 5060 Ti (16 GB VRAM)

QA Adapter (Phase 2)

Parameter Value
Base model above + STT adapter merged
Loader FastModel (Unsloth)
Method QLoRA 4-bit
LoRA r / alpha 32 / 64
Training pairs 2,476 Greek Q&A pairs (10 categories)
Epochs 2
Train loss 0.194
Teacher model qwen3.5:397b-cloud via Ollama
Hardware NVIDIA RTX 5060 Ti (16 GB VRAM)

Merge

STT adapter → merged into base → QA adapter → merged → GGUF export via Unsloth save_pretrained_gguf (quantises to Q4_K_M and Q8_0 on the fly; uses llama.cpp internally).

Unsloth-specific features used

  • FastVisionModel for audio/multimodal loading (STT adapter)
  • FastModel for text-only loading (QA adapter)
  • use_gradient_checkpointing="unsloth" — Unsloth's custom gradient checkpointing
  • adamw_8bit optimizer (bitsandbytes)
  • UnslothVisionDataCollator for multimodal audio batching
  • unsloth.chat_templates.get_chat_template("gemma-4") for correct Gemma 4 chat format
  • model.save_pretrained_gguf() for direct GGUF export without intermediate fp16 materialisation

Evaluation Results

Evaluated on 54 curated Greek cases (20 text Q&A + 34 spoken_qa audio) with manually verified references. Human case-by-case evaluation by Claude Sonnet 4.6.

Metric Base E4B (unmodified) Gemma4GR E4B v2 Improvement
Overall effective pass rate 35% (19/54) 45% (24.5/54) +29%
Audio spoken_qa effective passes 22% (7.5/34) 37% (12.5/34) +67%
Text Q&A effective passes 57% (11.5/20) 60% (12/20) +4%
Avg token F1 score 0.222 0.284 +28%
TTS generation pass rate 89% 96% +8%

Audio is the primary use case. The +67% improvement in audio spoken_qa is driven by the STT adapter (audio understanding) combined with the QA adapter (Greek text generation quality). Each adapter is necessary — STT-only without QA scored lower on audio response quality.

Known Limitations

  • Hard Greek words with affricates (τσ/τζ) still occasionally garble in STT: τσούχτρα, τζαμπατζής
  • Both base and fine-tuned share some semantic blind spots (αυγολέμονο, περίπτερο)
  • Model sometimes refuses harmless conversational statements — more conversational training data needed
  • Occasional system prompt leakage on audio inputs (chat template issue, being fixed)
  • Text Q&A improvement over base is marginal — more training pairs needed for next version

Intended Use

This model is built for Gemma4Kids — a local, offline Greek educational assistant for children. It runs via Ollama on consumer hardware (RTX 4060 Ti / 5060 Ti, 16 GB VRAM).

The goal is a single merged model that:

  • Understands spoken Greek questions from children
  • Responds with culturally accurate, fluent Modern Greek prose
  • Runs fully offline with no cloud dependency

License

This model is a fine-tune of google/gemma-4-it and is governed by the Gemma Terms of Use.

Q&A pairs generated by qwen3.5:397b-cloud via Ollama, curated and corrected by human review.

JOY voice training data — The synthetic Greek speech used in STT training was generated with the JOY Piper voice, which is separately licensed under CC BY-NC 4.0. See the Gemma4GR repo for details.


Citation

@misc{gemma4gr2026,
  title={Gemma4GR: Fine-tuning Gemma 4 for Greek Speech and Language Understanding},
  author={Efstathios Outas},
  year={2026},
  note={Google Gemma 4 Good Hackathon submission. Experimental — work in progress.},
  url={https://huggingface.co/Efso/gemma-4-E4B-it-GR-v2}
}

Acknowledgements

  • Unsloth for QLoRA fine-tuning infrastructure
  • Google DeepMind for the Gemma 4 base model
  • Piper TTS for Greek voice synthesis
  • Chara Kaltsou — creator of the JOY Greek voice (Piper TTS), whose synthetic speech was used in STT training data
  • Google Gemma 4 Good Hackathon (Kaggle, May 2026)
Downloads last month
552
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Efso/gemma-4-E4B-it-GR-v2

Adapter
(34)
this model