Instructions to use NDIJayant/gemma-4-E4B-comorbities with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NDIJayant/gemma-4-E4B-comorbities with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="NDIJayant/gemma-4-E4B-comorbities") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("NDIJayant/gemma-4-E4B-comorbities") model = AutoModelForMultimodalLM.from_pretrained("NDIJayant/gemma-4-E4B-comorbities") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use NDIJayant/gemma-4-E4B-comorbities with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NDIJayant/gemma-4-E4B-comorbities" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NDIJayant/gemma-4-E4B-comorbities", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/NDIJayant/gemma-4-E4B-comorbities
- SGLang
How to use NDIJayant/gemma-4-E4B-comorbities with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "NDIJayant/gemma-4-E4B-comorbities" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NDIJayant/gemma-4-E4B-comorbities", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "NDIJayant/gemma-4-E4B-comorbities" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NDIJayant/gemma-4-E4B-comorbities", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use NDIJayant/gemma-4-E4B-comorbities with Docker Model Runner:
docker model run hf.co/NDIJayant/gemma-4-E4B-comorbities
NDIJayant/gemma-4-E4B-comorbities
A Gemma 4 E4B-it model fine-tuned to extract patient comorbidities from
unstructured clinical notes and return a strict JSON schema. Produces
structured (comorbidity_name, status, comorbidity_date, evidence, reason)
entries with an issues audit array.
Intended use
- Input: an unstructured clinical note (ROS, PMH, Assessment, Imaging, Problem List, narrative). Up to ~15k tokens.
- Output: JSON with eligibility status, a list of comorbidities (with
onset/diagnosis date normalized to ISO), and an
issuesaudit field.
Not a medical device. For research and internal tooling only — every extraction must be reviewed by a qualified clinician before any clinical decision.
Training
Two-stage supervised fine-tuning, full parameter (no LoRA).
Stage 1 — base SFT:
- 115,459 train / 12,829 eval records
- Labels: Gemini 2.5 Flash generations using a detailed clinical-extraction system prompt
- 8 × RTX PRO 6000 Blackwell (95 GB), bf16, DeepSpeed ZeRO-2
- 1 epoch, LR 5e-6 cosine w/ 3% warmup, effective batch size 128
- Custom chunked lm_head + cross-entropy to avoid [B, S, 262K] logit OOM
- Output: checkpoint-810
Stage 2 — teacher-distillation alignment on refined data:
- Selected the top 40K records by output length from the stage-1 train set
- Refined each (clinical note, Gemini draft) pair through Gemma 4 31B-it served locally via vLLM, using the same clinical system prompt — producing additions, removals, and date/status corrections
- 33,315 refined records kept (JSON-valid, parseable)
- 1 additional epoch from the stage-1 checkpoint, same optimizer config
- Output: checkpoint-260 (this release)
Fully converged: eval loss was identical at step 130 vs step 260 (0.1323 vs 0.1324). No additional epochs would help on this data.
Evaluation
GPT-5.2 was used as an independent single-model judge (no head-to-head comparison — each model scored on its own merits against the clinical note, not against a Gemini-generated ground-truth that would bias the results in Gemini's favor). 50 samples drawn from the eval split with fixed seed.
| Dimension | gemma-enrich (this, ckpt-260) | gemma-base (stage 1, ckpt-810) | gemini-2.5-flash | gemma-4-E4B-it (zero-shot) |
|---|---|---|---|---|
| completeness | 8.29 | 7.79 | 7.82 | 6.90 |
| accuracy | 7.68 | 7.63 | 7.40 | 6.68 |
| no_hallucination | 9.10 | 9.26 | 8.92 | 8.11 |
| json_format | 9.18 | 9.16 | 5.79 | 8.92 |
| overall | 7.95 | 7.63 | 7.11 | 6.84 |
Reading the numbers:
- Two-stage SFT lifted overall 6.84 → 7.63 → 7.95 (+1.11 from base Gemma, +0.32 from the stage-2 refinement alignment pass).
- Beats the teacher (Gemini 2.5 Flash) by +0.84 overall as judged by an independent LLM — the student now outperforms the model that labelled its training data.
completenessclimbs by +1.39 from zero-shot to the final model, the single biggest axis of improvement (and the one the stage-2 refinement targeted).- Gemini's weakness is
json_format(5.79) — it frequently emits markdown-fenced output. Both Gemma checkpoints stay above 9.1.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tok = AutoTokenizer.from_pretrained("NDIJayant/gemma-4-E4B-comorbities")
model = AutoModelForCausalLM.from_pretrained(
"NDIJayant/gemma-4-E4B-comorbities", torch_dtype=torch.bfloat16, device_map="auto"
)
SYSTEM = "SYSTEM_PROMPT = """Extract comorbidities from the clinical note. Return only valid JSON, no markdown.
{
"eligibility_status": true,
"reason": {
"comorbidities": [
{
"comorbidity_name": "",
"status": "Pre-existing | Active | Historical | Unknown",
"comorbidity_date": "",
"evidence": "",
"reason": ""
}
]
},
"issues": []
}"""
"
note = "..."
prompt = f"<bos><|turn>system\n{SYSTEM}<turn|>\n<|turn>user\n{note}<turn|>\n<|turn>model\n"
ids = tok(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
out = model.generate(**ids, max_new_tokens=2048, do_sample=False)
print(tok.decode(out[0][ids["input_ids"].shape[1]:], skip_special_tokens=True))
Limitations
- English-language clinical notes only; not validated on other languages.
- Performance degrades on notes > ~15k tokens (training cap).
- Dates are best when explicitly stated in the note; for narrative-only
conditions the model may leave
comorbidity_dateempty. - Still inherits teacher biases from Gemini 2.5 Flash and Gemma 4 31B (judgment calls around status classification, RA/seronegative handling).
- No PHI / safety guarantees — output is not validated against identifiers.
Citation
Please cite the base model: google/gemma-4-E4B-it.
- Downloads last month
- -