MrLight commited on
Commit
b9550c5
·
verified ·
1 Parent(s): 06935c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -15,11 +15,11 @@ datasets:
15
 
16
  # DSE-Phi35-Vidore-ft
17
 
18
- DSE-Phi3-Docmatix-V2 is a bi-encoder model designed to encode document screenshots into dense vectors for document retrieval. The Document Screenshot Embedding ([DSE](https://arxiv.org/abs/2406.11251)) approach captures documents in their original visual format, preserving all information such as text, images, and layout, thus avoiding tedious parsing and potential information loss.
19
 
20
- The model, `Tevatron/dse-phi3-docmatix-v2`, is trained using 1/10 of the `Tevatron/docmatix-ir` dataset, a variant of `HuggingFaceM4/Docmatix` specifically adapted for training PDF retrievers with Vision Language Models in open-domain question answering scenarios. For more information on dataset filtering and hard negative mining, refer to the [docmatix-ir](https://huggingface.co/datasets/Tevatron/docmatix-ir/blob/main/README.md) dataset page.
 
21
 
22
- DSE has strong zero-shot effectiveness for document retrieval both with visual input and text input.
23
  For example, DSE-Phi3-Docmatix-V2 achieves **82.9** nDCG@5 on [ViDoRE](https://huggingface.co/spaces/vidore/vidore-leaderboard) leaderboard.
24
 
25
  ## How to train the model from scratch
 
15
 
16
  # DSE-Phi35-Vidore-ft
17
 
18
+ DSE-Phi3-Vidore-ft is a bi-encoder model designed to encode document screenshots into dense vectors for document retrieval. The Document Screenshot Embedding ([DSE](https://arxiv.org/abs/2406.11251)) approach captures documents in their original visual format, preserving all information such as text, images, and layout, thus avoiding tedious parsing and potential information loss.
19
 
20
+ The model, `Tevatron/dse-phi35-vidore-ft`, is trained using 1/10 of the `Tevatron/docmatix-ir` dataset, a variant of `HuggingFaceM4/Docmatix` specifically adapted for training PDF retrievers with Vision Language Models in open-domain question answering scenarios. For more information on dataset filtering and hard negative mining, refer to the [docmatix-ir](https://huggingface.co/datasets/Tevatron/docmatix-ir/blob/main/README.md) dataset page.
21
+ Followed by finetuning on the (vidore)[https://huggingface.co/datasets/vidore/colpali_train_set] training set. The checkpoint is warmed up by text retrieval and webpage retrieval.
22
 
 
23
  For example, DSE-Phi3-Docmatix-V2 achieves **82.9** nDCG@5 on [ViDoRE](https://huggingface.co/spaces/vidore/vidore-leaderboard) leaderboard.
24
 
25
  ## How to train the model from scratch