Update README: Qwen model link update
Browse files
README.md
CHANGED
|
@@ -6,7 +6,7 @@ language:
|
|
| 6 |
license: other
|
| 7 |
license_name: webai-non-commercial-license-v1.0
|
| 8 |
license_link: https://huggingface.co/webAI-Official/webAI-ColVec1-9b/blob/main/LICENSE.md
|
| 9 |
-
base_model: Qwen/Qwen3.5-
|
| 10 |
tags:
|
| 11 |
- text
|
| 12 |
- image
|
|
@@ -22,7 +22,7 @@ tags:
|
|
| 22 |
|
| 23 |
## ⚡ Summary
|
| 24 |
|
| 25 |
-
**webAI-Official/webAI-ColVec1-9b** is a state-of-the-art [ColBERT](https://arxiv.org/abs/2407.01449)-style multimodal embedding model based on *[Qwen/Qwen3.5-
|
| 26 |
|
| 27 |
The model has been fine-tuned on a **merged multimodal dataset** of ~2M question-image pairs, including [DocVQA](https://huggingface.co/datasets/lmms-lab/DocVQA), [PubTables-1M](https://huggingface.co/datasets/bsmock/pubtables-1m), [TAT-QA](https://huggingface.co/datasets/next-tat/TAT-QA), [ViDoRe-ColPali-Training](https://huggingface.co/datasets/vidore/colpali_train_set), [VDR Multilingual](https://huggingface.co/datasets/llamaindex/vdr-multilingual-train), [VisRAG-Ret-Train-In-domain-data](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Train-In-domain-data), [VisRAG-Ret-Train-Synthetic-data](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Train-Synthetic-data) and proprietary domain-specific synthetic data
|
| 28 |
|
|
@@ -33,7 +33,7 @@ The datasets were filtered, balanced, and merged to produce a comprehensive trai
|
|
| 33 |
|
| 34 |
| Feature | Detail |
|
| 35 |
| --------------------- | ------------------------------------------------------------------------- |
|
| 36 |
-
| **Architecture** | Qwen3.5-
|
| 37 |
| **Methodology** | ColBERT-style Late Interaction (MaxSim scoring) |
|
| 38 |
| **Output** | Multi-vector (Seq_Len × *2560*), L2-normalized |
|
| 39 |
| **Modalities** | Text Queries, Images (Documents) |
|
|
|
|
| 6 |
license: other
|
| 7 |
license_name: webai-non-commercial-license-v1.0
|
| 8 |
license_link: https://huggingface.co/webAI-Official/webAI-ColVec1-9b/blob/main/LICENSE.md
|
| 9 |
+
base_model: Qwen/Qwen3.5-9B
|
| 10 |
tags:
|
| 11 |
- text
|
| 12 |
- image
|
|
|
|
| 22 |
|
| 23 |
## ⚡ Summary
|
| 24 |
|
| 25 |
+
**webAI-Official/webAI-ColVec1-9b** is a state-of-the-art [ColBERT](https://arxiv.org/abs/2407.01449)-style multimodal embedding model based on *[Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)*. It maps text queries, visual documents (images, PDFs) into aligned multi-vector embeddings.
|
| 26 |
|
| 27 |
The model has been fine-tuned on a **merged multimodal dataset** of ~2M question-image pairs, including [DocVQA](https://huggingface.co/datasets/lmms-lab/DocVQA), [PubTables-1M](https://huggingface.co/datasets/bsmock/pubtables-1m), [TAT-QA](https://huggingface.co/datasets/next-tat/TAT-QA), [ViDoRe-ColPali-Training](https://huggingface.co/datasets/vidore/colpali_train_set), [VDR Multilingual](https://huggingface.co/datasets/llamaindex/vdr-multilingual-train), [VisRAG-Ret-Train-In-domain-data](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Train-In-domain-data), [VisRAG-Ret-Train-Synthetic-data](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Train-Synthetic-data) and proprietary domain-specific synthetic data
|
| 28 |
|
|
|
|
| 33 |
|
| 34 |
| Feature | Detail |
|
| 35 |
| --------------------- | ------------------------------------------------------------------------- |
|
| 36 |
+
| **Architecture** | Qwen3.5-9B Vision-Language Model (VLM) + `2560 dim` Linear Projection Head |
|
| 37 |
| **Methodology** | ColBERT-style Late Interaction (MaxSim scoring) |
|
| 38 |
| **Output** | Multi-vector (Seq_Len × *2560*), L2-normalized |
|
| 39 |
| **Modalities** | Text Queries, Images (Documents) |
|