Instructions to use jmdevita/medical-wayfinder-gemma-4-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jmdevita/medical-wayfinder-gemma-4-e2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jmdevita/medical-wayfinder-gemma-4-e2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("jmdevita/medical-wayfinder-gemma-4-e2b")
model = AutoModelForImageTextToText.from_pretrained("jmdevita/medical-wayfinder-gemma-4-e2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use jmdevita/medical-wayfinder-gemma-4-e2b with PEFT:
```
Task type is invalid.
```

llama-cpp-python

How to use jmdevita/medical-wayfinder-gemma-4-e2b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jmdevita/medical-wayfinder-gemma-4-e2b",
	filename="gemma-4-e2b-it.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use jmdevita/medical-wayfinder-gemma-4-e2b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
# Run inference directly in the terminal:
llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
# Run inference directly in the terminal:
llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
# Run inference directly in the terminal:
./llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16

Use Docker

docker model run hf.co/jmdevita/medical-wayfinder-gemma-4-e2b:BF16

LM Studio
Jan

vLLM

How to use jmdevita/medical-wayfinder-gemma-4-e2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jmdevita/medical-wayfinder-gemma-4-e2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jmdevita/medical-wayfinder-gemma-4-e2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jmdevita/medical-wayfinder-gemma-4-e2b:BF16

SGLang

How to use jmdevita/medical-wayfinder-gemma-4-e2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jmdevita/medical-wayfinder-gemma-4-e2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jmdevita/medical-wayfinder-gemma-4-e2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jmdevita/medical-wayfinder-gemma-4-e2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jmdevita/medical-wayfinder-gemma-4-e2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Ollama:
```
ollama run hf.co/jmdevita/medical-wayfinder-gemma-4-e2b:BF16
```

Unsloth Studio new

How to use jmdevita/medical-wayfinder-gemma-4-e2b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jmdevita/medical-wayfinder-gemma-4-e2b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jmdevita/medical-wayfinder-gemma-4-e2b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jmdevita/medical-wayfinder-gemma-4-e2b to start chatting

Pi new

How to use jmdevita/medical-wayfinder-gemma-4-e2b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jmdevita/medical-wayfinder-gemma-4-e2b:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jmdevita/medical-wayfinder-gemma-4-e2b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jmdevita/medical-wayfinder-gemma-4-e2b:BF16

Run Hermes

hermes

Docker Model Runner
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Docker Model Runner:
```
docker model run hf.co/jmdevita/medical-wayfinder-gemma-4-e2b:BF16
```

Lemonade

How to use jmdevita/medical-wayfinder-gemma-4-e2b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jmdevita/medical-wayfinder-gemma-4-e2b:BF16

Run and chat with the model

lemonade run user.medical-wayfinder-gemma-4-e2b-BF16

List all available models

lemonade list

Medical Wayfinder — Gemma 4 E2B

Navigation got you to the parking lot. Medical Wayfinder gets you to the doctor.

A fine-tuned Gemma 4 E2B for on-device healthcare facility wayfinding in English and Spanish. Patients describe a destination ("cardiology", "MRI", "where's parking for the children's ER?"); the model returns step-by-step directions with landmarks, accessibility info, and check-in instructions — all running locally on a phone via llama.cpp + Metal GPU. No PHI leaves the device.

Submission for the Gemma 4 Good Hackathon. Code repository: github.com/jmdevita/medical-wayfinder.

Quick start

This repo ships three artifacts. Pick the one that matches your use case:

Run inference with llama.cpp / Ollama / LM Studio (GGUF)

# llama.cpp
llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b --jinja \
  --model-file gemma-4-e2b-it.Q4_K_M.gguf

# Ollama
ollama create medical-wayfinder -f Modelfile
ollama run medical-wayfinder

Load the merged model with `transformers`

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "jmdevita/medical-wayfinder-gemma-4-e2b",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("jmdevita/medical-wayfinder-gemma-4-e2b")

Apply the LoRA adapter to your own base copy

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("google/gemma-4-e2b-it")
model = PeftModel.from_pretrained(base, "jmdevita/medical-wayfinder-gemma-4-e2b")

What it does

The model is a wayfinding assistant, not a medical advisor. Given:

A system prompt that defines a strict JSON response contract (destinations, steps, accessibility badges, disambiguation prompts, arrival markers)
A CONTEXT block describing a specific facility — its departments, entrances, parking lots, and topology graph
A user query in English or Spanish

…it emits a structured JSON response that the host app parses into a multi-step walking guide. Five facilities ship in the open-source app: Atrius Boston Kenmore, Kaiser Panorama City, Massachusetts General, Southern JP, and Tufts Medical Center.

A deterministic Dart-side orchestrator handles alias lookup and Dijkstra path-finding over a hand-authored topology graph — the model handles intent classification, hedging, multilingual phrasing, and accessibility-aware step formatting.

Training details


Base model	`google/gemma-4-e2b-it` (5.1B params, 2.3B effective). Trained from Unsloth's 4-bit quantized variant `unsloth/gemma-4-e2b-it-unsloth-bnb-4bit` for memory efficiency on a consumer GPU.
Adapter	LoRA, rank 8
Training steps	78
Dataset size	310 examples
Dataset source	100% synthetic, generated by a larger teacher LLM (`qwen3.5-122B`) against a published generation prompt at `training/data/prompts/generation.txt`. Curated 1000 directional phrases from public call-center datasets anchor the synthetic data (no real patient queries).
Training framework	Unsloth
Quantization	GGUF Q4_K_M (3.4 GB) for on-device inference
Verification	Merged-then-quantized GGUF SHA differs from base (`14638e2b…` vs `e781b34b…`), confirming the adapter is in the weights

Evaluation

Held-out 100-case eval suite. Same production system prompt runs against base and fine-tune; only weights change. Judge: gpt-oss-120b (cross-family, JSON-schema-constrained, default reasoning effort). Suite is published verbatim at training/data/eval/eval_suite.jsonl.

Headline

Metric	Base	Fine-tune	Δ
Mean rubric score (1-5)	3.62	3.98	+0.36
Strict pass (corr ≥ 4 AND mean ≥ 3.5)	28%	38%	+10 pp
Soft pass (corr ≥ 3 AND mean ≥ 3.5)	47%	56%	+9 pp
English mean	3.57	3.94	+0.37
Spanish mean	3.92	4.17	+0.25

Per-criterion

Criterion	Base	Fine-tune	Δ
Scope Handling	3.37	4.20	+0.83
Correctness	3.07	3.46	+0.39
Accessibility	3.59	3.88	+0.29
Landmarks	3.17	3.38	+0.21
Format	4.90	4.96	+0.06

Every criterion lifts. Scope Handling moved most — the targeted round-3 distillation pass added 20 batches of scope_enforcement examples and explicitly forbid "I'm not able to give medical advice" hedging.

Spanish now outscores English under the production configuration (4.17 vs 3.94, gap of -0.23). Training set is ~30% Spanish examples after the bilingual category pass.

One trade-off worth flagging

Verbatim route-copy rate (the model's ability to reproduce landmark prose character-for-character) regresses with the May-15 prompt revision (67% → 50% on the same fine-tune). The longer, more directive new prompt nudges the model to paraphrase. Other metrics improve, so the net is positive on mean, strict pass, soft pass, and Spanish — but the verbatim cost is the largest single regression in the eval matrix.

Full eval methodology — including a 2×2 controlled comparison (model × prompt), per-criterion failure-mode breakdown, and rubric design rationale — is reproducible from the committed eval suite. See the four canonical JSONs:

training/output/eval_results/eval_summary_gemma4_e2b_2026-05-15T22-29-19.json (base + new prompt)
training/output/eval_results/eval_summary_gemma4-e2b-wf-cp78_2026-05-15T22-55-36.json (cp78 + new prompt)
Plus the two old-prompt runs for the 2×2 controls

Run env/bin/python training/scripts/eval_runner.py with the corresponding model and the system prompt at health_wayfinder/assets/system_prompt.txt to reproduce.

Intended use

In-scope: Hospital/clinic wayfinding queries in English or Spanish, against a CONTEXT block derived from a structured facility JSON file. The model expects the system prompt at health_wayfinder/assets/system_prompt.txt and emits a JSON response per the schema in that prompt.
Out of scope: Medical advice, diagnosis, triage, appointment scheduling, EHR integration, billing inquiries, or any clinical decision. The system prompt explicitly classifies these as out-of-scope and the model is trained to deflect them.
Deployment target: On-device on iOS via llama.cpp + Metal GPU. Q4_K_M quantization fits a ~3.4 GB binary in the app bundle; first-launch copy to Documents directory.

Limitations and known issues

Eval is directional, not statistically significant — 100 cases at a single seed.
The eval suite was authored alongside the data contract, which biases the results in the way training-adjacent evals always do. The suite is published verbatim for reproducibility.
Training data is 100% synthetic — anchored with curated real-world directional phrases but no real patient queries. Anchoring with 50–100 real queries is the next dataset improvement.
Hedging on edge cases — "I can't walk far" or "I'm on the orange line" still get over-applied medical-question template responses ~25% of the time. Further prompt sharpening has diminishing returns; the fix is more diverse retraining data.
Per-prompt verbatim trade-off documented above.
Multimodal (photo re-orientation) has the camera path live and the model path stubbed; that's a V2 item. The BF16-mmproj.gguf in this repo is published for future multimodal work but unused by the current app.

License

This model is a derivative of Google's Gemma 4 E2B and is therefore subject to the Gemma Terms of Use in addition to anything stated here. By downloading you agree to those terms.

The training data, eval suite, and accompanying code in the GitHub repository are licensed CC-BY 4.0.

Citation

@misc{medical-wayfinder-2026,
  author = {De Vita, Julian},
  title  = {Medical Wayfinder: On-device fine-tuned Gemma 4 E2B for multilingual hospital navigation},
  year   = {2026},
  url    = {https://huggingface.co/jmdevita/medical-wayfinder-gemma-4-e2b},
  note   = {Gemma 4 Good Hackathon submission},
}

Acknowledgements

Google DeepMind for the Gemma 4 family
The Unsloth team for the fine-tuning framework (~2× faster training on a consumer GPU)
OpenStreetMap contributors — facility data is derived in part from OSM under ODbL §4.3
Jamshidi et al. (HERD 2025), Sela et al. (AMIA 2018), González Cueto et al. (JGIM 2024) for the peer-reviewed evidence underpinning the problem framing — see SOURCES.md in the GitHub repo

Trained 2× faster with Unsloth.

Downloads last month: 178

Safetensors

Model size

5B params

Tensor type

BF16