Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,90 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-nd-4.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-nd-4.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- Qwen/Qwen3-4B
|
| 7 |
+
pipeline_tag: question-answering
|
| 8 |
+
library_name: transformers
|
| 9 |
+
tags:
|
| 10 |
+
- Pathology
|
| 11 |
+
- Agent
|
| 12 |
+
- arxiv:2508.02258
|
| 13 |
+
---
|
| 14 |
+
## Introduction📝
|
| 15 |
+
**Vision Language Models** have demonstrated significant potential in medical imaging tasks, but pathology presents unique challenges due to its ultra-high resolution, complex tissue structures, and nuanced clinical semantics. These challenges often lead to **hallucinations** in VLMs, where the outputs are inconsistent with the visual evidence, undermining clinical trust. Current **Retrieval-Augmented Generation (RAG)** approaches predominantly rely on text-based knowledge bases, limiting their ability to effectively incorporate critical visual information from pathology images.
|
| 16 |
+
|
| 17 |
+
To address these challenges, we introduce **Patho-AgenticRAG**, a **multimodal RAG framework** that integrates page-level embeddings from authoritative pathology textbooks with **joint text-image retrieval**. This approach enables the retrieval of textbook pages containing both relevant textual and visual cues, ensuring that essential image-based information is preserved. Patho-AgenticRAG also supports advanced capabilities such as **reasoning**, **task decomposition**, and **multi-turn search interactions**, improving diagnostic accuracy in complex scenarios.
|
| 18 |
+
|
| 19 |
+
Our experiments demonstrate that Patho-AgenticRAG significantly outperforms existing multimodal models in many tasks such as multiple-choice diagnosis and visual question answering.
|
| 20 |
+

|
| 21 |
+
### Quickstart🏃
|
| 22 |
+
This document outlines the workflow for setting up and running the **Patho-AgenticRAG** framework. The process involves the ingestion of pathology pdf images, model downloads, and serving the models for inference via API servers. Below are the steps to follow:
|
| 23 |
+
## 1. Milvus Ingestion
|
| 24 |
+
To ingest pathology images into Milvus for searching:
|
| 25 |
+
```bash
|
| 26 |
+
python milvus_ingestion.py
|
| 27 |
+
```
|
| 28 |
+
## 2. Milvus Search Engine API
|
| 29 |
+
Next, run the Milvus search engine API to handle the retrieval process:
|
| 30 |
+
```bash
|
| 31 |
+
python milvus_search_engine_api.py
|
| 32 |
+
```
|
| 33 |
+
## 3. Model Download
|
| 34 |
+
Download the necessary models from Hugging Face. These models are critical for the workflow and should be stored locally.
|
| 35 |
+
- Agentic-Router:
|
| 36 |
+
```bash
|
| 37 |
+
hf download WenchuanZhang/Agentic-Router --local-dir ./models/Agentic-Router
|
| 38 |
+
```
|
| 39 |
+
- VRAG-Agent:
|
| 40 |
+
```bash
|
| 41 |
+
hf download autumncc/Qwen2.5-VL-7B-VRAG --local-dir ./models/Qwen2.5-VL-7B-VRAG
|
| 42 |
+
```
|
| 43 |
+
- Patho-R1:
|
| 44 |
+
```bash
|
| 45 |
+
hf download WenchuanZhang/Patho-R1-7B --local-dir ./models/Patho-R1-7B --token <your-token>
|
| 46 |
+
```
|
| 47 |
+
## 4. Serving the Models
|
| 48 |
+
You can now serve the models for inference using the following commands:
|
| 49 |
+
- Agentic Router (on CUDA device 1):
|
| 50 |
+
```bash
|
| 51 |
+
CUDA_VISIBLE_DEVICES=1 python3 -m vllm.entrypoints.openai.api_server --model ./models/Agentic-Router --port 8002 --host 0.0.0.0 --served-model-name Agentic-Router --tensor-parallel-size 1
|
| 52 |
+
```
|
| 53 |
+
- Qwen2.5-VL-7B-VRAG (on CUDA devices 2 and 3):
|
| 54 |
+
```bash
|
| 55 |
+
CUDA_VISIBLE_DEVICES=2,3 vllm serve ./models/Qwen2.5-VL-7B-VRAG --port 8003 --host 0.0.0.0 --limit-mm-per-prompt image=10 --served-model-name VRAG-Agent --tensor-parallel-size 2
|
| 56 |
+
```
|
| 57 |
+
- Patho-R1 (on CUDA devices 4 and 5):
|
| 58 |
+
```bash
|
| 59 |
+
CUDA_VISIBLE_DEVICES=4,5 python3 -m vllm.entrypoints.openai.api_server --model ./models/Patho-R1-7B --tokenizer ./models/Patho-R1-7B --port 8004 --host 0.0.0.0 --served-model-name Patho-R1 --tensor-parallel-size 2
|
| 60 |
+
```
|
| 61 |
+
## 5. Running the Demo
|
| 62 |
+
Finally, run the Patho-AgenticRAG script for a demo:
|
| 63 |
+
```bash
|
| 64 |
+
python patho_agenticrag.py
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
## Acknowledgements🎖
|
| 68 |
+
We gratefully acknowledge the contributions of the open-source community, particularly the following projects which laid the foundation for various components of this work:
|
| 69 |
+
|
| 70 |
+
- [Qwen](https://github.com/QwenLM) for providing powerful vision language models that significantly advanced our multimodal understanding and generation capabilities.
|
| 71 |
+
- [VRAG](https://github.com/Alibaba-NLP/VRAG) for enabling high-quality visual reasoning and agent-based training frameworks.
|
| 72 |
+
- [Milvus](https://github.com/milvus-io/milvus) for offering an efficient and scalable vector database that supports advanced search capabilities.
|
| 73 |
+
- [Colpali](https://github.com/illuin-tech/colpali) for providing valuable tools for language model interaction and enhancement.
|
| 74 |
+
- [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) for robust LLM training and fine-tuning pipelines.
|
| 75 |
+
- [VERL](https://github.com/volcengine/verl) for valuable visual-language pretraining resources.
|
| 76 |
+
- [DeepSeek](https://github.com/deepseek-ai) for high-quality models and infrastructure supporting text understanding.
|
| 77 |
+
|
| 78 |
+
We thank the authors and contributors of these repositories for their dedication and impactful work, which made our development of Patho-AgenticRAG possible.
|
| 79 |
+
|
| 80 |
+
## Citation❤️
|
| 81 |
+
If you find our work helpful, a citation would be greatly appreciated. Also, consider giving us a star ⭐ on GitHub to support the project!
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
@article{zhang2025patho,
|
| 85 |
+
title={Patho-agenticrag: Towards multimodal agentic retrieval-augmented generation for pathology vlms via reinforcement learning},
|
| 86 |
+
author={Zhang, Wenchuan and Guo, Jingru and Zhang, Hengzhe and Zhang, Penghao and Chen, Jie and Zhang, Shuwan and Zhang, Zhang and Yi, Yuhao and Bu, Hong},
|
| 87 |
+
journal={arXiv preprint arXiv:2508.02258},
|
| 88 |
+
year={2025}
|
| 89 |
+
}
|
| 90 |
+
```
|