Instructions to use Efso/gemma-4-E4B-it-GR-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Efso/gemma-4-E4B-it-GR-v2 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Efso/gemma-4-E4B-it-GR-v2", filename="gemma-4-e4b-it-gr-v2-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Efso/gemma-4-E4B-it-GR-v2 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
Use Docker
docker model run hf.co/Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use Efso/gemma-4-E4B-it-GR-v2 with Ollama:
ollama run hf.co/Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
- Unsloth Studio new
How to use Efso/gemma-4-E4B-it-GR-v2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Efso/gemma-4-E4B-it-GR-v2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Efso/gemma-4-E4B-it-GR-v2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Efso/gemma-4-E4B-it-GR-v2 to start chatting
- Pi new
How to use Efso/gemma-4-E4B-it-GR-v2 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Efso/gemma-4-E4B-it-GR-v2:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Efso/gemma-4-E4B-it-GR-v2 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Efso/gemma-4-E4B-it-GR-v2 with Docker Model Runner:
docker model run hf.co/Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
- Lemonade
How to use Efso/gemma-4-E4B-it-GR-v2 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
Run and chat with the model
lemonade run user.gemma-4-E4B-it-GR-v2-Q4_K_M
List all available models
lemonade list
Gemma4GR — Greek Speech & Language Fine-tune (E4B v2)
⚠️ Experimental — Work in Progress
This model was produced for the Google Gemma 4 Good Hackathon (Kaggle, May 2026). Training is ongoing and results are preliminary. Do not use in production.
Code & training pipeline: https://github.com/Efs-O/Gemma4GR
What is this?
Gemma4GR is a fine-tuned version of google/gemma-4-it (4B parameters) trained to improve Greek language understanding in two areas:
Greek Speech-to-Text (STT) — Audio LoRA trained on a mixed corpus of real human voice recordings (3,217 WAVs across 17 categories: talking to children, storytelling, everyday conversation, family, food, school, culture, nature, healthcare, sports, travel, numbers, news, and more) and synthetic Greek speech generated by JOY, a custom Piper TTS voice trained for this project. The model learns to understand spoken Modern Greek and respond naturally in Greek.
Greek Text Q&A — Language LoRA trained on 2,476 curated Greek Q&A pairs covering 10 topic categories (children's education, culture, everyday life, food, geography, history, language, mythology, religion, science). Improves fluency and cultural accuracy of Greek prose generation.
The two adapters are merged sequentially into a single GGUF model, ready to drop into Ollama or llama.cpp.
Model Files
| File | Format | Size | Use |
|---|---|---|---|
gemma-4-e4b-it-gr-v2-Q4_K_M.gguf |
GGUF Q4_K_M | 5.0 GB | Primary inference |
gemma-4-e4b-it-gr-v2-Q8_0.gguf |
GGUF Q8_0 | 7.5 GB | Higher precision inference |
gemma-4-e4b-it-gr-v2-mmproj.gguf |
GGUF F16 | 945 MB | Vision/audio projection |
Training Details
STT Adapter (Phase 1)
| Parameter | Value |
|---|---|
| Base model | unsloth/gemma-4-E4B-it |
| Loader | FastVisionModel (Unsloth) |
| Method | QLoRA 4-bit |
| LoRA r / alpha | 32 / 64 |
| Training pairs | 2,895 audio Q&A pairs (human voice WAVs + JOY/Piper synthetic) |
| Epochs | 2 |
| Train loss | 0.088 |
| Best eval loss | 4.876 |
| Hardware | NVIDIA RTX 5060 Ti (16 GB VRAM) |
QA Adapter (Phase 2)
| Parameter | Value |
|---|---|
| Base model | above + STT adapter merged |
| Loader | FastModel (Unsloth) |
| Method | QLoRA 4-bit |
| LoRA r / alpha | 32 / 64 |
| Training pairs | 2,476 Greek Q&A pairs (10 categories) |
| Epochs | 2 |
| Train loss | 0.194 |
| Teacher model | qwen3.5:397b-cloud via Ollama |
| Hardware | NVIDIA RTX 5060 Ti (16 GB VRAM) |
Merge
STT adapter → merged into base → QA adapter → merged → GGUF export via Unsloth save_pretrained_gguf (quantises to Q4_K_M and Q8_0 on the fly; uses llama.cpp internally).
Unsloth-specific features used
FastVisionModelfor audio/multimodal loading (STT adapter)FastModelfor text-only loading (QA adapter)use_gradient_checkpointing="unsloth"— Unsloth's custom gradient checkpointingadamw_8bitoptimizer (bitsandbytes)UnslothVisionDataCollatorfor multimodal audio batchingunsloth.chat_templates.get_chat_template("gemma-4")for correct Gemma 4 chat formatmodel.save_pretrained_gguf()for direct GGUF export without intermediate fp16 materialisation
Evaluation Results
Evaluated on 54 curated Greek cases (20 text Q&A + 34 spoken_qa audio) with manually verified references. Human case-by-case evaluation by Claude Sonnet 4.6.
| Metric | Base E4B (unmodified) | Gemma4GR E4B v2 | Improvement |
|---|---|---|---|
| Overall effective pass rate | 35% (19/54) | 45% (24.5/54) | +29% |
| Audio spoken_qa effective passes | 22% (7.5/34) | 37% (12.5/34) | +67% |
| Text Q&A effective passes | 57% (11.5/20) | 60% (12/20) | +4% |
| Avg token F1 score | 0.222 | 0.284 | +28% |
| TTS generation pass rate | 89% | 96% | +8% |
Audio is the primary use case. The +67% improvement in audio spoken_qa is driven by the STT adapter (audio understanding) combined with the QA adapter (Greek text generation quality). Each adapter is necessary — STT-only without QA scored lower on audio response quality.
Known Limitations
- Hard Greek words with affricates (τσ/τζ) still occasionally garble in STT: τσούχτρα, τζαμπατζής
- Both base and fine-tuned share some semantic blind spots (αυγολέμονο, περίπτερο)
- Model sometimes refuses harmless conversational statements — more conversational training data needed
- Occasional system prompt leakage on audio inputs (chat template issue, being fixed)
- Text Q&A improvement over base is marginal — more training pairs needed for next version
Intended Use
This model is built for Gemma4Kids — a local, offline Greek educational assistant for children. It runs via Ollama on consumer hardware (RTX 4060 Ti / 5060 Ti, 16 GB VRAM).
The goal is a single merged model that:
- Understands spoken Greek questions from children
- Responds with culturally accurate, fluent Modern Greek prose
- Runs fully offline with no cloud dependency
License
This model is a fine-tune of google/gemma-4-it and is governed by the Gemma Terms of Use.
Q&A pairs generated by qwen3.5:397b-cloud via Ollama, curated and corrected by human review.
JOY voice training data — The synthetic Greek speech used in STT training was generated with the JOY Piper voice, which is separately licensed under CC BY-NC 4.0. See the Gemma4GR repo for details.
Citation
@misc{gemma4gr2026,
title={Gemma4GR: Fine-tuning Gemma 4 for Greek Speech and Language Understanding},
author={Efstathios Outas},
year={2026},
note={Google Gemma 4 Good Hackathon submission. Experimental — work in progress.},
url={https://huggingface.co/Efso/gemma-4-E4B-it-GR-v2}
}
Acknowledgements
- Unsloth for QLoRA fine-tuning infrastructure
- Google DeepMind for the Gemma 4 base model
- Piper TTS for Greek voice synthesis
- Chara Kaltsou — creator of the JOY Greek voice (Piper TTS), whose synthetic speech was used in STT training data
- Google Gemma 4 Good Hackathon (Kaggle, May 2026)
- Downloads last month
- 552
4-bit
8-bit