Instructions to use mims-harvard/ATHENA-R1-Qwen3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mims-harvard/ATHENA-R1-Qwen3-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mims-harvard/ATHENA-R1-Qwen3-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mims-harvard/ATHENA-R1-Qwen3-8B") model = AutoModelForCausalLM.from_pretrained("mims-harvard/ATHENA-R1-Qwen3-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mims-harvard/ATHENA-R1-Qwen3-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mims-harvard/ATHENA-R1-Qwen3-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mims-harvard/ATHENA-R1-Qwen3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mims-harvard/ATHENA-R1-Qwen3-8B
- SGLang
How to use mims-harvard/ATHENA-R1-Qwen3-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mims-harvard/ATHENA-R1-Qwen3-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mims-harvard/ATHENA-R1-Qwen3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mims-harvard/ATHENA-R1-Qwen3-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mims-harvard/ATHENA-R1-Qwen3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mims-harvard/ATHENA-R1-Qwen3-8B with Docker Model Runner:
docker model run hf.co/mims-harvard/ATHENA-R1-Qwen3-8B
ATHENA-R1-Qwen3-8B
Project page: athena.openscientist.ai · Code: mims-harvard/ATHENA
ATHENA-R1 is an AI agent for treatment reasoning, trained through reinforcement learning over a universe of 212 biomedical tools. It performs multi-step reasoning — identifying what evidence is needed, selecting tools, and incorporating retrieved evidence into subsequent steps — with tool calls served through the ToolUniverse (FDA labeling, Open Targets, ChEMBL, EuropePMC, etc.).
Given a clinical question, the model performs multi-step tool calls, synthesises the evidence, and returns a free-form answer grounded in authoritative biomedical sources.
Quick start
The model is exposed through the
athena-r1 Python package,
which handles the tool-call protocol and conversation management. Two
services back the agent: vLLM (model server) and ToolUniverse (tool server).
# 1. Install
pip install "athena-r1[vllm,web] @ git+https://github.com/mims-harvard/ATHENA.git"
# 2. Start backing services
bash scripts/launch_tooluniverse.sh # → :8080
bash scripts/launch_vllm.sh 8000 mims-harvard/ATHENA-R1-Qwen3-8B
# 3. Run the agent (Python)
python -c "
from athena_r1 import AthenaR1
agent = AthenaR1(
model='mims-harvard/ATHENA-R1-Qwen3-8B',
vllm_url='http://0.0.0.0:8000/v1',
tool_server='http://0.0.0.0:8080',
)
print(agent.answer('Dose adjustment for metformin in CKD eGFR 35?').answer)
"
For a chat UI (bundled browser demo with live-streamed reasoning):
python web/agui_server.py # → http://localhost:8090/ (AG-UI server + demo)
For an OpenAI-compatible API endpoint:
python web/openai_server.py # → http://localhost:9000/v1/chat/completions
Inference settings (paper-canonical)
| Parameter | Value |
|---|---|
| temperature | 0.7 |
| top_p | 0.95 |
| top_k | 20 |
| min_p | 0.0 |
| presence_penalty | 0 |
| max_round | 40 |
| concurrent Qs | 4 |
Evaluation
Open-ended setting: each question is answered free-form, then mapped to one of the original answer choices.
| Benchmark | n | ATHENA-R1 | GPT-5 |
|---|---|---|---|
| DrugPC (open-ended drug reasoning) | 3,168 | 94.7% | 76.9% |
| TreatmentPC (patient-specific treatment) | 456 | 82.9% | 72.2% |
ATHENA-R1 exceeds GPT-5 by 17.8 points on DrugPC and 10.7 on TreatmentPC.
See the
docs/eval_results.md
file in the code repo for the full benchmark tables and the two-level
self-learning ablation.
How it works
- Stage 1 — multi-step tool reasoning: the model emits
<tool_call>...</tool_call>blocks; the runtime dispatches them through ToolUniverse, appends results to the conversation, and re-prompts. Loop continues until[FinalAnswer]ormax_roundis hit. - Stage 2 (eval only) — option mapping: a separate function call maps the free-form answer to an MCQ letter. Two backends supported: the local ATHENA-R1 model (self-extraction) or Azure GPT-5 (external reader).
Intended use
ATHENA-R1 is a research artifact for treatment-reasoning research and decision support. It is not a medical device and must not be used for direct patient care.
Citation
@article{gao2026athena,
title = {An AI agent for treatment reasoning over a biomedical tool universe},
author = {Gao, Shanghua and ... and Zitnik, Marinka},
journal = {arXiv preprint},
year = {2026}
}
License
MIT.
Acknowledgements
Evidence retrieval is powered by ToolUniverse, a library of curated biomedical tools.
- Downloads last month
- 24