TenaOS — Gemma 4 E4B Instruct (BF16 GGUF)

llama.cpp-ready BF16 conversion of google/gemma-4-E4B-it, plus the audio mmproj projector. Used by TenaOS for on-device clinical inference (text + voice, multimodal in a single pass).

Contents

File Size Purpose
gemma-4-E4B-it-BF16.gguf ~15 GB Full-precision GGUF for generation
mmproj-gemma-4-E4B-it-bf16.gguf ~946 MB Multimodal projector for audio input

We standardize on BF16 full precision. No quantization in the production path.

Usage

hf download beza4588/TenaOS --local-dir ./models
# launch llama-server (CUDA build):
llama-server \
    -m ./models/gemma-4-E4B-it-BF16.gguf \
    --mmproj ./models/mmproj-gemma-4-E4B-it-bf16.gguf \
    --host 0.0.0.0 --port 8000 -ngl 99 --jinja --alias gemma-4

In TenaOS the docker image bind-mounts this directory at /models; see scripts/fetch-models.sh.

License

Inherits the Gemma Terms of Use. TenaOS packaging is Apache 2.0.

Downloads last month
270
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for beza4588/TenaOS

Quantized
(196)
this model