DocReRank
/

DocReRank-Reranker

Visual Document Retrieval

vision-language

Model card Files Files and versions

navvew commited on Jul 22, 2025

Commit

d0adbeb

·

verified ·

1 Parent(s): fd301c2

Update README.md

Files changed (1) hide show

README.md +63 -3

README.md CHANGED Viewed

@@ -1,3 +1,63 @@
----
-license: cc-by-4.0
----

+---
+language: en
+tags:
+- reranker
+- RAG
+- multimodal
+- vision-language
+- Qwen
+license: cc-by-4.0
+pipeline_tag: visual-document-retrieval
+---
+# DocReRank: Multi-Modal Reranker
+This is the official model from the paper:
+📄 **[DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers](https://arxiv.org/abs/2505.22584)**
+---
+## ✅ Model Overview
+- **Base model:** [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)
+- **Architecture:** Vision-Language reranker
+- **Fine-tuning method:** PEFT (LoRA)
+- **Training data:** Generated by **Single-Page Hard Negative Query Generation** Pipeline.
+- **Purpose:** Improves second-stage reranking for Retrieval-Augmented Generation (RAG) in multimodal scenarios.
+---
+## ✅ How to Use
+This adapter requires the base Qwen2-VL model.
+```python
+from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
+from peft import PeftModel
+import torch
+from PIL import Image
+# Load base model
+base_model = Qwen2VLForConditionalGeneration.from_pretrained(
+    "Qwen/Qwen2-VL-2B-Instruct",
+    torch_dtype=torch.bfloat16,
+    device_map="cuda"
+)
+# Load DocReRank adapter
+model = PeftModel.from_pretrained(base_model, "DocReRank/DocReRank-Reranker").eval()
+# Load processor
+processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
+# Example query and image
+query = "What is the total revenue in the table?"
+image = Image.open("sample_page.png")
+inputs = processor(text=query, images=image, return_tensors="pt").to("cuda", torch.bfloat16)
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=16)
+print(processor.tokenizer.decode(outputs[0], skip_special_tokens=True))