Instructions to use Raras-AI/araras-gemma4-e4b-v4-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use Raras-AI/araras-gemma4-e4b-v4-gguf with PEFT:
```
Task type is invalid.
```

How to use Raras-AI/araras-gemma4-e4b-v4-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Raras-AI/araras-gemma4-e4b-v4-gguf",
	filename="araras-gemma4-e4b-v4-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Raras-AI/araras-gemma4-e4b-v4-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Use Docker

docker model run hf.co/Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

LM Studio
Jan

vLLM

How to use Raras-AI/araras-gemma4-e4b-v4-gguf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Raras-AI/araras-gemma4-e4b-v4-gguf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Raras-AI/araras-gemma4-e4b-v4-gguf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Ollama
How to use Raras-AI/araras-gemma4-e4b-v4-gguf with Ollama:
```
ollama run hf.co/Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M
```

Unsloth Studio new

How to use Raras-AI/araras-gemma4-e4b-v4-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Raras-AI/araras-gemma4-e4b-v4-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Raras-AI/araras-gemma4-e4b-v4-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Raras-AI/araras-gemma4-e4b-v4-gguf to start chatting

Pi new

How to use Raras-AI/araras-gemma4-e4b-v4-gguf with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Raras-AI/araras-gemma4-e4b-v4-gguf with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use Raras-AI/araras-gemma4-e4b-v4-gguf with Docker Model Runner:
```
docker model run hf.co/Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M
```

Lemonade

How to use Raras-AI/araras-gemma4-e4b-v4-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Raras-AI/araras-gemma4-e4b-v4-gguf:Q4_K_M

Run and chat with the model

lemonade run user.araras-gemma4-e4b-v4-gguf-Q4_K_M

List all available models

lemonade list

Araras-Gemma4-E4B — Clinical Decision Support for Brazilian Rare-Disease Care

The first Gemma 4 fine-tune purpose-built as offline clinical decision support for healthcare professionals attending Brazil's SUS (Sistema Único de Saúde).

"Eu levei vinte anos pra ter o meu diagnóstico de doença rara. Não dá pra mudar isso pra mim. Mas a próxima criança raríssima que entrar numa UBS amanhã — o médico que atender ela não precisa estar sozinho." — Dimas, fundador da Raras, paciente raro (distonia mioclônica, ORPHA:36899)

Submission to the Gemma 4 Good Hackathon.

⚕️ Intended use

Clinical decision support for licensed healthcare professionals (médicos, enfermeiros, agentes comunitários, geneticistas) attending patients in Brazil — especially in remote regions where rare-disease expertise doesn't reach.

Not a patient-facing diagnostic tool
Not a replacement for clinical judgment
Augments the physician with structured differentials, PCDT lookups, and CEAF medication info
Runs offline on smartphone, laptop, or any device that runs llama.cpp / Ollama / MLX

Compliance posture

Positioned as Software as a Medical Device (SaMD) — Clinical Decision Support under ANVISA Res. 657/2022, 751/2022, and 830/2023. Aligned with CFM guidance: AI augments physicians, never replaces. Every clinical claim is grounded in verifiable public sources (gov.br/conitec, bvsms, PubMed).

Model details

Base: unsloth/gemma-4-E4B-it (8B total / 4.5B effective parameters)
Method: QLoRA SFT via Unsloth — r=8, α=8, dropout=0, NEFTune α=5, train_on_responses_only
Training data: 120,740 train / 5,137 val examples (deduplicated), all PT-BR rare-disease content:
- 108k curated rare-disease Q&A (HPO/OMIM/ORPHA-grounded)
- 26k MedPT PT-BR clinical pairs
- 10k ultra-rare disease longitudinal cases
- 10k tool-calling examples (HPO normalize, PCDT lookup, CID-10 map)
- 5.7k knowledge-graph triples (RarasNet Neo4j — 10,468 rare diseases)
- 3.2k FindZebra hard cases + 1.3k ReDis-QA
Compute: A100 80GB on Vertex AI, ~2 hours, ~$25 total cost
Native Gemma 4 features used:
- <|channel>thought thinking blocks (toggleable per request)
- 128K context (fits full patient laudo + PCDT + retrieved cases)
- System role for safety constraints

Companion artifacts

Repo	Format	Size	Use case
Raras-AI/araras-gemma4-e4b-v4-sota	PEFT LoRA adapter	147 MB	This — apply to base Gemma 4 E4B
Raras-AI/araras-gemma4-e4b-v4-gguf	GGUF Q4_K_M / Q5_K_M	5.3 / 5.7 GB	llama.cpp, Ollama, edge
Raras-AI/araras-hpo-brasil	sentence-transformers	340 MB	PT-BR clinical → HPO matcher (companion)
Raras-AI/araras-hpo-brasil-int8	ONNX INT8	85 MB	Edge HPO matching
🤗 RareBench-BR (public)	JSONL dataset	2.5 MB	835 SUS-grounded validated cases

End-to-end pipeline

PT-BR free text (laudo, prontuário, transcrição da consulta)
    ↓
[1] 🧬 araras-hpo-brasil (BioLORD-2023 fine-tune for PT-BR)
       Normaliza idioma clínico regional → HPO codes
       "amarelão" → HP:0000952 · "bebê molinho" → HP:0001252
    ↓
[2] 🧠 araras-gemma4-e4b Q4_K_M (this model, 5.3 GB, llama.cpp)
       Gera TOP-5 diferenciais ranqueados em PT-BR
    ↓
[3] ✅ Hybrid canonical ORPHA resolver (production technique, e.g. MedCAT/scispaCy)
       Tier 1: strict substring match on 10,468-keyword PT-BR dict (~30ms)
       Tier 2: BioLORD semantic fallback @ cosine ≥ 0.78 (~50ms, only if Tier 1 fails)
       Returns None (honest abstention) if neither tier matches confidently
    ↓
[4] 📋 PCDT overlay — 24 PCDTs do MS estruturados → CEAF + centro de referência
    ↓
Output: structured clinical decision support — differentials + PCDT + SUS conduta + centro

Total stack footprint: 5.5 GB. Runs offline on iPhone, Android, laptop. Zero cloud. Zero LGPD risk.

Evaluation — RareBench-BR

We built RareBench-BR, the first benchmark uniting Dx + SUS conduta in PT-BR. Anchored in 52,343 real anonymized SUS patient trajectories (APAC, CNS-linked, 2017-2020+).

Results on full unified RareBench-BR_SUS (833 cases, 0 errors)

Metric	Araras-Gemma4 (Q4_K_M offline, 4.5B)	DeepSeek V4 (cloud, ~600B, 36-case subsample)
R@1 (clinical name)	41.2%	86.1%
R@3	47.1%	91.7%
🔥 Track B PCDT-correct	76.8% (331/431)	91.7% (33/36)
Errors	0	n/a
Latency p50	6.5s	4.1s
Cost per query	$0 (local)	~$0.001
Params	4.5B	~600B
Device	phone	data center

Per-layer breakdown (full 833 cases):

Layer	N	R@1	R@3
L3_v2 — PCDT atypical presentations	135	27.4%	32.6%
L4 — Hard BR (tropical phenocopies, founder, IEI)	79	17.7%	19.0%
L5_v2 — SUS-grounded synthesis	619	47.2%	53.8%
L5_realsus standalone (common SUS rare diseases)	240	70.4%	78.3%

Why Track B = 76.8% is the headline: for 3 out of every 4 cases where ground truth specifies a CEAF-dispensed medication, Araras recommends the exact molecule SUS actually pays for. Nobody publishes this number because nobody trained on this data. Cloud frontier (DeepSeek 600B) hits 91.7% — Araras hits 76.8% at 150× fewer parameters, fully offline, $0 marginal cost.

Vs. prior Qwen3.5-9B baseline on L1: 16.6% R@1 at 64s

Araras-Gemma4 is 2.5× more accurate, 9× faster, smaller.

Example clinical conversations

These are real outputs from the deployed model. Multi-turn, with HPO normalization upstream.

Example 1 — AME tipo 1 (UBS in Petrolina-PE, no internet)

Médico: Lactente feminino, 4 meses, natural de Petrolina (PE), pais primos. Hipotonia profunda generalizada desde o nascimento, ausência de sustentação cefálica, choro fraco, dificuldade para mamar, abolição dos reflexos osteotendíneos profundos. SpO2 95%.

Araras (4.5s): "Olá! 👶 Com base nos achados clínicos e na história, o quadro sugere um diagnóstico de AME tipo 1. A tríade clássica (hipotonia profunda generalizada desde o nascimento + ausência dos reflexos osteotendíneos profundos + dificuldade para mamar) é altamente sugestiva. Recomendo encaminhamento para neurologia pediátrica e investigação laboratorial específica (eletroneuromiografia, testes genéticos SMN1)."

Médico turn 2: Como agilizar acesso ao tratamento pelo SUS?

Araras (5.0s): "Para acelerar o acesso diagnóstico e terapêutico no SUS: (1) Protocolo de Urgência/Emergência: se houver piora do quadro, acione SAMU/UPA. (2) Articulação com Equipe Básica: peça ao médico da UBS um laudo detalhado, com ênfase na 'hipotonia profunda generalizada desde o nascimento' e nos marcos perdidos. (3) Interconsulta Direta: solicite agendamento prioritário com Neuropediatria do município/região de referência."

Example 2 — Crise vaso-oclusiva em Doença Falciforme (UBS rural BA)

Médico: Adolescente masculino, 15 anos, zona rural de Conceição do Coité (BA). Internado com dor torácica intensa + dispneia. Triagem neonatal positiva HbSS, crises álgicas vaso-oclusivas frequentes. Hb 6.8, retic 12%, leuco 18k. Sat 89%.

Araras (1.8s): "🚨 ALERTA — Quadro altamente sugestivo de Crise vaso-oclusiva grave em paciente com Anemia Falciforme. Aguardamos retorno do Hematologista para manejo da crise e avaliação transfusional urgente. Manter monitorização respiratória máxima, oxigenoterapia e analgesia contínua."

Médico turn 2: É síndrome torácica aguda? Que conduta urgente?

Araras (3.5s): "⚠️ Conduta IMEDIATA: (1) Oxigênio 10L/min, manter saturação >92%. (2) Analgesia potente (opioides IV) e manejo do choque (fluidos com cautela). (3) Hemotransfusão emergencial de hemácias concentradas se Hb <6 g/dL ou lactato alto. (4) Manter monitorização cardiorrespiratória e comunicar Hematologia."

Example 3 — Fenilcetonúria (triagem neonatal Manaus-AM)

Médico: RN masculino, 12 dias de vida, Manaus (AM). Teste do pezinho com fenilalanina sérica 18 mg/dL (normal <2). Pais consanguíneos.

Araras (4.3s): "🚨 ALERTA ALTO RISCO. O teste do pezinho com fenilalanina sérica de 18 mg/dL é extremamente elevado. Quadro de Fenilcetonúria (PKU) que requer teste confirmatório imediato. Não espere o retorno dos resultados: a hiperfenilalaninemia leva à encefalotoxicidade irreversível em horas. AÇÃO IMEDIATA: (1) Notificar UTI Neonatal/Neuropediatria. (2) Iniciar controle metabólico (fórmula livre de Phe). (3) Coleta de aminoácidos plasmáticos e urinários."

How to use

Option A — llama.cpp (recommended for clinical edge deployment)

hf download Raras-AI/araras-gemma4-e4b-v4-gguf araras-gemma4-e4b-v4-Q4_K_M.gguf --local-dir ./

llama-server -m araras-gemma4-e4b-v4-Q4_K_M.gguf -ngl 99 -c 8192 --jinja

curl http://127.0.0.1:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "messages": [
    {"role": "system", "content": "Você é ARARAS, copiloto de apoio à decisão clínica em doenças raras..."},
    {"role": "user", "content": "<caso clínico em PT-BR>"}
  ],
  "temperature": 0.5, "top_p": 0.95, "top_k": 64, "repeat_penalty": 1.15
}'

Option B — transformers + PEFT

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = "unsloth/gemma-4-E4B-it"
adapter = "Raras-AI/araras-gemma4-e4b-v4-sota"

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()

Option C — Full pipeline (BioLORD + Gemma + ORPHA lookup + PCDT)

git clone https://github.com/rarasAI/araras-gemma4
cd araras-gemma4 && pip install -r requirements.txt
python -m araras_gemma4.pipeline "<caso clínico>"

Limitations

Quantization Q4 ceiling: Q4_K_M loses ~3-5% accuracy vs full bf16. Use Q5_K_M or full adapter for research.
Ultra-rare disease coverage: training data heavily skewed toward the 12 CEAF-covered diseases. Performance drops on ultra-rare conditions (<100 cases described in BR).
SMA subtype confusion: distinguishing SMA type 1 vs 2 vs 3 from phenotype text alone is genuinely hard without genetic typing. We document this honestly.
ORPHA code hallucination: like all rare-disease LLMs, may emit invented ORPHA numbers. Always use the canonical lookup post-processor in the pipeline.
Not for emergency triage: this is for longitudinal-care decision support, not ER triage.
PT-BR primary: model trained primarily for Brazilian Portuguese clinical idiom. English/Spanish performance not characterized.

Companion: RareBench-BR_Public

Together with this model we release RareBench-BR_Public — 835 validated cases spanning 24 official Ministry of Health PCDTs + hard Brazilian rare-disease cases. The first benchmark in the world that scores rare-disease LLMs on Brazilian SUS conduta (PCDT + CEAF medication).

Citation

@misc{raras_gemma4_2026,
  author = {Raras Team},
  title  = {Araras-Gemma4-E4B: Open Gemma 4 Clinical Decision Support for Brazilian Rare-Disease Care},
  year   = {2026},
  month  = {may},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/Raras-AI/araras-gemma4-e4b-v4-sota},
  howpublished = {Submitted to Gemma 4 Good Hackathon, Google DeepMind / Kaggle},
}

Acknowledgments

Built on the work of Google DeepMind (Gemma 4), Unsloth (training recipe), FremyCompany (BioLORD-2023), Chen et al. (RareBench L1), Conitec / Ministry of Health Brazil (PCDTs), Casa dos Raros (model of local-clinician intervention), and the rare-disease open-science community.

Built by a Brazilian rare-disease patient, for the 13 million Brazilians the existing AI doesn't know exist — and for the doctors who attend them in UBSs across the country.

Downloads last month: 225

GGUF

Model size

7B params

Architecture

gemma4

Hardware compatibility

4-bit

5-bit

Model tree for Raras-AI/araras-gemma4-e4b-v4-gguf

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

unsloth/gemma-4-E4B-it

Adapter

(31)

this model

Dataset used to train Raras-AI/araras-gemma4-e4b-v4-gguf

Space using Raras-AI/araras-gemma4-e4b-v4-gguf 1

Evaluation results

R@1 (full unified bench, 833 cases) on RareBench-BR_SUS unified (833 cases, full layers, 0 errors)
self-reported

0.412
R@3 (full unified bench) on RareBench-BR_SUS unified (833 cases, full layers, 0 errors)
self-reported

0.471
Track B PCDT-correct (331/431 evaluable — model recommends exact CEAF drug) on RareBench-BR_SUS unified (833 cases, full layers, 0 errors)
self-reported

0.768
L5_v2 R@1 (SUS-grounded subset, n=619) on RareBench-BR_SUS unified (833 cases, full layers, 0 errors)
self-reported

0.472
L5_realsus standalone R@1 (n=240, common SUS rare diseases) on RareBench-BR_SUS unified (833 cases, full layers, 0 errors)
self-reported

0.704