Instructions to use Efso/gemma-4-E4B-it-GR-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Efso/gemma-4-E4B-it-GR-v2 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Efso/gemma-4-E4B-it-GR-v2",
	filename="gemma-4-e4b-it-gr-v2-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Efso/gemma-4-E4B-it-GR-v2 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

Use Docker

docker model run hf.co/Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

LM Studio
Jan
Ollama
How to use Efso/gemma-4-E4B-it-GR-v2 with Ollama:
```
ollama run hf.co/Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
```

Unsloth Studio new

How to use Efso/gemma-4-E4B-it-GR-v2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Efso/gemma-4-E4B-it-GR-v2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Efso/gemma-4-E4B-it-GR-v2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Efso/gemma-4-E4B-it-GR-v2 to start chatting

Pi new

How to use Efso/gemma-4-E4B-it-GR-v2 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Efso/gemma-4-E4B-it-GR-v2:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Efso/gemma-4-E4B-it-GR-v2 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use Efso/gemma-4-E4B-it-GR-v2 with Docker Model Runner:
```
docker model run hf.co/Efso/gemma-4-E4B-it-GR-v2:Q4_K_M
```

Lemonade

How to use Efso/gemma-4-E4B-it-GR-v2 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Efso/gemma-4-E4B-it-GR-v2:Q4_K_M

Run and chat with the model

lemonade run user.gemma-4-E4B-it-GR-v2-Q4_K_M

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Gemma4GR — Greek Speech & Language Fine-tune (E4B v2)

⚠️ Experimental — Work in Progress
This model was produced for the Google Gemma 4 Good Hackathon (Kaggle, May 2026). Training is ongoing and results are preliminary. Do not use in production.

Code & training pipeline: https://github.com/Efs-O/Gemma4GR

What is this?

Gemma4GR is a fine-tuned version of google/gemma-4-it (4B parameters) trained to improve Greek language understanding in two areas:

Greek Speech-to-Text (STT) — Audio LoRA trained on a mixed corpus of real human voice recordings (3,217 WAVs across 17 categories: talking to children, storytelling, everyday conversation, family, food, school, culture, nature, healthcare, sports, travel, numbers, news, and more) and synthetic Greek speech generated by JOY, a custom Piper TTS voice trained for this project. The model learns to understand spoken Modern Greek and respond naturally in Greek.
Greek Text Q&A — Language LoRA trained on 2,476 curated Greek Q&A pairs covering 10 topic categories (children's education, culture, everyday life, food, geography, history, language, mythology, religion, science). Improves fluency and cultural accuracy of Greek prose generation.

The two adapters are merged sequentially into a single GGUF model, ready to drop into Ollama or llama.cpp.

Model Files

File	Format	Size	Use
`gemma-4-e4b-it-gr-v2-Q4_K_M.gguf`	GGUF Q4_K_M	5.0 GB	Primary inference
`gemma-4-e4b-it-gr-v2-Q8_0.gguf`	GGUF Q8_0	7.5 GB	Higher precision inference
`gemma-4-e4b-it-gr-v2-mmproj.gguf`	GGUF F16	945 MB	Vision/audio projection

Training Details

STT Adapter (Phase 1)

Parameter	Value
Base model	`unsloth/gemma-4-E4B-it`
Loader	`FastVisionModel` (Unsloth)
Method	QLoRA 4-bit
LoRA r / alpha	32 / 64
Training pairs	2,895 audio Q&A pairs (human voice WAVs + JOY/Piper synthetic)
Epochs	2
Train loss	0.088
Best eval loss	4.876
Hardware	NVIDIA RTX 5060 Ti (16 GB VRAM)

QA Adapter (Phase 2)

Parameter	Value
Base model	above + STT adapter merged
Loader	`FastModel` (Unsloth)
Method	QLoRA 4-bit
LoRA r / alpha	32 / 64
Training pairs	2,476 Greek Q&A pairs (10 categories)
Epochs	2
Train loss	0.194
Teacher model	`qwen3.5:397b-cloud` via Ollama
Hardware	NVIDIA RTX 5060 Ti (16 GB VRAM)

Merge

STT adapter → merged into base → QA adapter → merged → GGUF export via Unsloth save_pretrained_gguf (quantises to Q4_K_M and Q8_0 on the fly; uses llama.cpp internally).

Unsloth-specific features used

FastVisionModel for audio/multimodal loading (STT adapter)
FastModel for text-only loading (QA adapter)
use_gradient_checkpointing="unsloth" — Unsloth's custom gradient checkpointing
adamw_8bit optimizer (bitsandbytes)
UnslothVisionDataCollator for multimodal audio batching
unsloth.chat_templates.get_chat_template("gemma-4") for correct Gemma 4 chat format
model.save_pretrained_gguf() for direct GGUF export without intermediate fp16 materialisation

Evaluation Results

Evaluated on 54 curated Greek cases (20 text Q&A + 34 spoken_qa audio) with manually verified references. Human case-by-case evaluation by Claude Sonnet 4.6.

Metric	Base E4B (unmodified)	Gemma4GR E4B v2	Improvement
Overall effective pass rate	35% (19/54)	45% (24.5/54)	+29%
Audio spoken_qa effective passes	22% (7.5/34)	37% (12.5/34)	+67%
Text Q&A effective passes	57% (11.5/20)	60% (12/20)	+4%
Avg token F1 score	0.222	0.284	+28%
TTS generation pass rate	89%	96%	+8%

Audio is the primary use case. The +67% improvement in audio spoken_qa is driven by the STT adapter (audio understanding) combined with the QA adapter (Greek text generation quality). Each adapter is necessary — STT-only without QA scored lower on audio response quality.

Known Limitations

Hard Greek words with affricates (τσ/τζ) still occasionally garble in STT: τσούχτρα, τζαμπατζής
Both base and fine-tuned share some semantic blind spots (αυγολέμονο, περίπτερο)
Model sometimes refuses harmless conversational statements — more conversational training data needed
Occasional system prompt leakage on audio inputs (chat template issue, being fixed)
Text Q&A improvement over base is marginal — more training pairs needed for next version

Intended Use

This model is built for Gemma4Kids — a local, offline Greek educational assistant for children. It runs via Ollama on consumer hardware (RTX 4060 Ti / 5060 Ti, 16 GB VRAM).

The goal is a single merged model that:

Understands spoken Greek questions from children
Responds with culturally accurate, fluent Modern Greek prose
Runs fully offline with no cloud dependency

License

This model is a fine-tune of google/gemma-4-it and is governed by the Gemma Terms of Use.

Q&A pairs generated by qwen3.5:397b-cloud via Ollama, curated and corrected by human review.

JOY voice training data — The synthetic Greek speech used in STT training was generated with the JOY Piper voice, which is separately licensed under CC BY-NC 4.0. See the Gemma4GR repo for details.

Citation

@misc{gemma4gr2026,
  title={Gemma4GR: Fine-tuning Gemma 4 for Greek Speech and Language Understanding},
  author={Efstathios Outas},
  year={2026},
  note={Google Gemma 4 Good Hackathon submission. Experimental — work in progress.},
  url={https://huggingface.co/Efso/gemma-4-E4B-it-GR-v2}
}

Acknowledgements

Unsloth for QLoRA fine-tuning infrastructure
Google DeepMind for the Gemma 4 base model
Piper TTS for Greek voice synthesis
Chara Kaltsou — creator of the JOY Greek voice (Piper TTS), whose synthetic speech was used in STT training data
Google Gemma 4 Good Hackathon (Kaggle, May 2026)

Downloads last month: 552

GGUF

Model size

8B params

Architecture

gemma4

Hardware compatibility

4-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Efso/gemma-4-E4B-it-GR-v2

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

unsloth/gemma-4-E4B-it

Adapter

(34)

this model