Instructions to use jmdevita/medical-wayfinder-gemma-4-e2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jmdevita/medical-wayfinder-gemma-4-e2b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("jmdevita/medical-wayfinder-gemma-4-e2b") model = AutoModelForImageTextToText.from_pretrained("jmdevita/medical-wayfinder-gemma-4-e2b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use jmdevita/medical-wayfinder-gemma-4-e2b with PEFT:
Task type is invalid.
- llama-cpp-python
How to use jmdevita/medical-wayfinder-gemma-4-e2b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jmdevita/medical-wayfinder-gemma-4-e2b", filename="gemma-4-e2b-it.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use jmdevita/medical-wayfinder-gemma-4-e2b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16 # Run inference directly in the terminal: llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16 # Run inference directly in the terminal: llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16 # Run inference directly in the terminal: ./llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
Use Docker
docker model run hf.co/jmdevita/medical-wayfinder-gemma-4-e2b:BF16
- LM Studio
- Jan
- vLLM
How to use jmdevita/medical-wayfinder-gemma-4-e2b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jmdevita/medical-wayfinder-gemma-4-e2b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jmdevita/medical-wayfinder-gemma-4-e2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jmdevita/medical-wayfinder-gemma-4-e2b:BF16
- SGLang
How to use jmdevita/medical-wayfinder-gemma-4-e2b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jmdevita/medical-wayfinder-gemma-4-e2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jmdevita/medical-wayfinder-gemma-4-e2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jmdevita/medical-wayfinder-gemma-4-e2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jmdevita/medical-wayfinder-gemma-4-e2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Ollama:
ollama run hf.co/jmdevita/medical-wayfinder-gemma-4-e2b:BF16
- Unsloth Studio new
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jmdevita/medical-wayfinder-gemma-4-e2b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jmdevita/medical-wayfinder-gemma-4-e2b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jmdevita/medical-wayfinder-gemma-4-e2b to start chatting
- Pi new
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jmdevita/medical-wayfinder-gemma-4-e2b:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf jmdevita/medical-wayfinder-gemma-4-e2b:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jmdevita/medical-wayfinder-gemma-4-e2b:BF16
Run Hermes
hermes
- Docker Model Runner
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Docker Model Runner:
docker model run hf.co/jmdevita/medical-wayfinder-gemma-4-e2b:BF16
- Lemonade
How to use jmdevita/medical-wayfinder-gemma-4-e2b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jmdevita/medical-wayfinder-gemma-4-e2b:BF16
Run and chat with the model
lemonade run user.medical-wayfinder-gemma-4-e2b-BF16
List all available models
lemonade list
Medical Wayfinder — Gemma 4 E2B
Navigation got you to the parking lot. Medical Wayfinder gets you to the doctor.
A fine-tuned Gemma 4 E2B for on-device healthcare facility wayfinding in English and Spanish. Patients describe a destination ("cardiology", "MRI", "where's parking for the children's ER?"); the model returns step-by-step directions with landmarks, accessibility info, and check-in instructions — all running locally on a phone via llama.cpp + Metal GPU. No PHI leaves the device.
Submission for the Gemma 4 Good Hackathon. Code repository: github.com/jmdevita/medical-wayfinder.
Quick start
This repo ships three artifacts. Pick the one that matches your use case:
Run inference with llama.cpp / Ollama / LM Studio (GGUF)
# llama.cpp
llama-cli -hf jmdevita/medical-wayfinder-gemma-4-e2b --jinja \
--model-file gemma-4-e2b-it.Q4_K_M.gguf
# Ollama
ollama create medical-wayfinder -f Modelfile
ollama run medical-wayfinder
Load the merged model with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"jmdevita/medical-wayfinder-gemma-4-e2b",
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("jmdevita/medical-wayfinder-gemma-4-e2b")
Apply the LoRA adapter to your own base copy
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-e2b-it")
model = PeftModel.from_pretrained(base, "jmdevita/medical-wayfinder-gemma-4-e2b")
What it does
The model is a wayfinding assistant, not a medical advisor. Given:
- A system prompt that defines a strict JSON response contract (destinations, steps, accessibility badges, disambiguation prompts, arrival markers)
- A CONTEXT block describing a specific facility — its departments, entrances, parking lots, and topology graph
- A user query in English or Spanish
…it emits a structured JSON response that the host app parses into a multi-step walking guide. Five facilities ship in the open-source app: Atrius Boston Kenmore, Kaiser Panorama City, Massachusetts General, Southern JP, and Tufts Medical Center.
A deterministic Dart-side orchestrator handles alias lookup and Dijkstra path-finding over a hand-authored topology graph — the model handles intent classification, hedging, multilingual phrasing, and accessibility-aware step formatting.
Training details
| Base model | google/gemma-4-e2b-it (5.1B params, 2.3B effective). Trained from Unsloth's 4-bit quantized variant unsloth/gemma-4-e2b-it-unsloth-bnb-4bit for memory efficiency on a consumer GPU. |
| Adapter | LoRA, rank 8 |
| Training steps | 78 |
| Dataset size | 310 examples |
| Dataset source | 100% synthetic, generated by a larger teacher LLM (qwen3.5-122B) against a published generation prompt at training/data/prompts/generation.txt. Curated 1000 directional phrases from public call-center datasets anchor the synthetic data (no real patient queries). |
| Training framework | Unsloth |
| Quantization | GGUF Q4_K_M (3.4 GB) for on-device inference |
| Verification | Merged-then-quantized GGUF SHA differs from base (14638e2b… vs e781b34b…), confirming the adapter is in the weights |
Evaluation
Held-out 100-case eval suite. Same production system prompt runs against base and fine-tune; only weights change. Judge: gpt-oss-120b (cross-family, JSON-schema-constrained, default reasoning effort). Suite is published verbatim at training/data/eval/eval_suite.jsonl.
Headline
| Metric | Base | Fine-tune | Δ |
|---|---|---|---|
| Mean rubric score (1-5) | 3.62 | 3.98 | +0.36 |
| Strict pass (corr ≥ 4 AND mean ≥ 3.5) | 28% | 38% | +10 pp |
| Soft pass (corr ≥ 3 AND mean ≥ 3.5) | 47% | 56% | +9 pp |
| English mean | 3.57 | 3.94 | +0.37 |
| Spanish mean | 3.92 | 4.17 | +0.25 |
Per-criterion
| Criterion | Base | Fine-tune | Δ |
|---|---|---|---|
| Scope Handling | 3.37 | 4.20 | +0.83 |
| Correctness | 3.07 | 3.46 | +0.39 |
| Accessibility | 3.59 | 3.88 | +0.29 |
| Landmarks | 3.17 | 3.38 | +0.21 |
| Format | 4.90 | 4.96 | +0.06 |
Every criterion lifts. Scope Handling moved most — the targeted round-3 distillation pass added 20 batches of scope_enforcement examples and explicitly forbid "I'm not able to give medical advice" hedging.
Spanish now outscores English under the production configuration (4.17 vs 3.94, gap of -0.23). Training set is ~30% Spanish examples after the bilingual category pass.
One trade-off worth flagging
Verbatim route-copy rate (the model's ability to reproduce landmark prose character-for-character) regresses with the May-15 prompt revision (67% → 50% on the same fine-tune). The longer, more directive new prompt nudges the model to paraphrase. Other metrics improve, so the net is positive on mean, strict pass, soft pass, and Spanish — but the verbatim cost is the largest single regression in the eval matrix.
Full eval methodology — including a 2×2 controlled comparison (model × prompt), per-criterion failure-mode breakdown, and rubric design rationale — is reproducible from the committed eval suite. See the four canonical JSONs:
training/output/eval_results/eval_summary_gemma4_e2b_2026-05-15T22-29-19.json(base + new prompt)training/output/eval_results/eval_summary_gemma4-e2b-wf-cp78_2026-05-15T22-55-36.json(cp78 + new prompt)- Plus the two old-prompt runs for the 2×2 controls
Run env/bin/python training/scripts/eval_runner.py with the corresponding model and the system prompt at health_wayfinder/assets/system_prompt.txt to reproduce.
Intended use
- In-scope: Hospital/clinic wayfinding queries in English or Spanish, against a CONTEXT block derived from a structured facility JSON file. The model expects the system prompt at
health_wayfinder/assets/system_prompt.txtand emits a JSON response per the schema in that prompt. - Out of scope: Medical advice, diagnosis, triage, appointment scheduling, EHR integration, billing inquiries, or any clinical decision. The system prompt explicitly classifies these as out-of-scope and the model is trained to deflect them.
- Deployment target: On-device on iOS via llama.cpp + Metal GPU. Q4_K_M quantization fits a ~3.4 GB binary in the app bundle; first-launch copy to Documents directory.
Limitations and known issues
- Eval is directional, not statistically significant — 100 cases at a single seed.
- The eval suite was authored alongside the data contract, which biases the results in the way training-adjacent evals always do. The suite is published verbatim for reproducibility.
- Training data is 100% synthetic — anchored with curated real-world directional phrases but no real patient queries. Anchoring with 50–100 real queries is the next dataset improvement.
- Hedging on edge cases — "I can't walk far" or "I'm on the orange line" still get over-applied medical-question template responses ~25% of the time. Further prompt sharpening has diminishing returns; the fix is more diverse retraining data.
- Per-prompt verbatim trade-off documented above.
- Multimodal (photo re-orientation) has the camera path live and the model path stubbed; that's a V2 item. The
BF16-mmproj.ggufin this repo is published for future multimodal work but unused by the current app.
License
This model is a derivative of Google's Gemma 4 E2B and is therefore subject to the Gemma Terms of Use in addition to anything stated here. By downloading you agree to those terms.
The training data, eval suite, and accompanying code in the GitHub repository are licensed CC-BY 4.0.
Citation
@misc{medical-wayfinder-2026,
author = {De Vita, Julian},
title = {Medical Wayfinder: On-device fine-tuned Gemma 4 E2B for multilingual hospital navigation},
year = {2026},
url = {https://huggingface.co/jmdevita/medical-wayfinder-gemma-4-e2b},
note = {Gemma 4 Good Hackathon submission},
}
Acknowledgements
- Google DeepMind for the Gemma 4 family
- The Unsloth team for the fine-tuning framework (~2× faster training on a consumer GPU)
- OpenStreetMap contributors — facility data is derived in part from OSM under ODbL §4.3
- Jamshidi et al. (HERD 2025), Sela et al. (AMIA 2018), González Cueto et al. (JGIM 2024) for the peer-reviewed evidence underpinning the problem framing — see SOURCES.md in the GitHub repo
Trained 2× faster with Unsloth.
- Downloads last month
- 178