WenchuanZhang commited on
Commit
078e50b
·
verified ·
1 Parent(s): e489a76

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -3
README.md CHANGED
@@ -1,3 +1,90 @@
1
- ---
2
- license: cc-by-nc-nd-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-nd-4.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen3-4B
7
+ pipeline_tag: question-answering
8
+ library_name: transformers
9
+ tags:
10
+ - Pathology
11
+ - Agent
12
+ - arxiv:2508.02258
13
+ ---
14
+ ## Introduction📝
15
+ **Vision Language Models** have demonstrated significant potential in medical imaging tasks, but pathology presents unique challenges due to its ultra-high resolution, complex tissue structures, and nuanced clinical semantics. These challenges often lead to **hallucinations** in VLMs, where the outputs are inconsistent with the visual evidence, undermining clinical trust. Current **Retrieval-Augmented Generation (RAG)** approaches predominantly rely on text-based knowledge bases, limiting their ability to effectively incorporate critical visual information from pathology images.
16
+
17
+ To address these challenges, we introduce **Patho-AgenticRAG**, a **multimodal RAG framework** that integrates page-level embeddings from authoritative pathology textbooks with **joint text-image retrieval**. This approach enables the retrieval of textbook pages containing both relevant textual and visual cues, ensuring that essential image-based information is preserved. Patho-AgenticRAG also supports advanced capabilities such as **reasoning**, **task decomposition**, and **multi-turn search interactions**, improving diagnostic accuracy in complex scenarios.
18
+
19
+ Our experiments demonstrate that Patho-AgenticRAG significantly outperforms existing multimodal models in many tasks such as multiple-choice diagnosis and visual question answering.
20
+ ![Patho-AgenticRAG Overview](https://github.com/Wenchuan-Zhang/Patho-AgenticRAG/raw/main/docs/casestudy.png)
21
+ ### Quickstart🏃
22
+ This document outlines the workflow for setting up and running the **Patho-AgenticRAG** framework. The process involves the ingestion of pathology pdf images, model downloads, and serving the models for inference via API servers. Below are the steps to follow:
23
+ ## 1. Milvus Ingestion
24
+ To ingest pathology images into Milvus for searching:
25
+ ```bash
26
+ python milvus_ingestion.py
27
+ ```
28
+ ## 2. Milvus Search Engine API
29
+ Next, run the Milvus search engine API to handle the retrieval process:
30
+ ```bash
31
+ python milvus_search_engine_api.py
32
+ ```
33
+ ## 3. Model Download
34
+ Download the necessary models from Hugging Face. These models are critical for the workflow and should be stored locally.
35
+ - Agentic-Router:
36
+ ```bash
37
+ hf download WenchuanZhang/Agentic-Router --local-dir ./models/Agentic-Router
38
+ ```
39
+ - VRAG-Agent:
40
+ ```bash
41
+ hf download autumncc/Qwen2.5-VL-7B-VRAG --local-dir ./models/Qwen2.5-VL-7B-VRAG
42
+ ```
43
+ - Patho-R1:
44
+ ```bash
45
+ hf download WenchuanZhang/Patho-R1-7B --local-dir ./models/Patho-R1-7B --token <your-token>
46
+ ```
47
+ ## 4. Serving the Models
48
+ You can now serve the models for inference using the following commands:
49
+ - Agentic Router (on CUDA device 1):
50
+ ```bash
51
+ CUDA_VISIBLE_DEVICES=1 python3 -m vllm.entrypoints.openai.api_server --model ./models/Agentic-Router --port 8002 --host 0.0.0.0 --served-model-name Agentic-Router --tensor-parallel-size 1
52
+ ```
53
+ - Qwen2.5-VL-7B-VRAG (on CUDA devices 2 and 3):
54
+ ```bash
55
+ CUDA_VISIBLE_DEVICES=2,3 vllm serve ./models/Qwen2.5-VL-7B-VRAG --port 8003 --host 0.0.0.0 --limit-mm-per-prompt image=10 --served-model-name VRAG-Agent --tensor-parallel-size 2
56
+ ```
57
+ - Patho-R1 (on CUDA devices 4 and 5):
58
+ ```bash
59
+ CUDA_VISIBLE_DEVICES=4,5 python3 -m vllm.entrypoints.openai.api_server --model ./models/Patho-R1-7B --tokenizer ./models/Patho-R1-7B --port 8004 --host 0.0.0.0 --served-model-name Patho-R1 --tensor-parallel-size 2
60
+ ```
61
+ ## 5. Running the Demo
62
+ Finally, run the Patho-AgenticRAG script for a demo:
63
+ ```bash
64
+ python patho_agenticrag.py
65
+ ```
66
+
67
+ ## Acknowledgements🎖
68
+ We gratefully acknowledge the contributions of the open-source community, particularly the following projects which laid the foundation for various components of this work:
69
+
70
+ - [Qwen](https://github.com/QwenLM) for providing powerful vision language models that significantly advanced our multimodal understanding and generation capabilities.
71
+ - [VRAG](https://github.com/Alibaba-NLP/VRAG) for enabling high-quality visual reasoning and agent-based training frameworks.
72
+ - [Milvus](https://github.com/milvus-io/milvus) for offering an efficient and scalable vector database that supports advanced search capabilities.
73
+ - [Colpali](https://github.com/illuin-tech/colpali) for providing valuable tools for language model interaction and enhancement.
74
+ - [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) for robust LLM training and fine-tuning pipelines.
75
+ - [VERL](https://github.com/volcengine/verl) for valuable visual-language pretraining resources.
76
+ - [DeepSeek](https://github.com/deepseek-ai) for high-quality models and infrastructure supporting text understanding.
77
+
78
+ We thank the authors and contributors of these repositories for their dedication and impactful work, which made our development of Patho-AgenticRAG possible.
79
+
80
+ ## Citation❤️
81
+ If you find our work helpful, a citation would be greatly appreciated. Also, consider giving us a star ⭐ on GitHub to support the project!
82
+
83
+ ```
84
+ @article{zhang2025patho,
85
+ title={Patho-agenticrag: Towards multimodal agentic retrieval-augmented generation for pathology vlms via reinforcement learning},
86
+ author={Zhang, Wenchuan and Guo, Jingru and Zhang, Hengzhe and Zhang, Penghao and Chen, Jie and Zhang, Shuwan and Zhang, Zhang and Yi, Yuhao and Bu, Hong},
87
+ journal={arXiv preprint arXiv:2508.02258},
88
+ year={2025}
89
+ }
90
+ ```