ColPali
Safetensors
English
vidore
manu commited on
Commit
6b9ef3c
1 Parent(s): ab6b403

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -35,7 +35,7 @@ Our training dataset of 127,460 query-page pairs is comprised of train sets of o
35
  Our training set is fully English by design, enabling us to study zero-shot generalization to non-English languages. We explicitly verify no multi-page PDF document is used both [*ViDoRe*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) and in the train set to prevent evaluation contamination.
36
  A validation set is created with 2% of the samples to tune hyperparameters.
37
 
38
- *Note: Multilingual data is present in the pretraining corpus of the language model (Gemma-2B) and potentially occurs during PaliGemma-3B's multimodal training.*
39
 
40
  ### Parameters
41
 
 
35
  Our training set is fully English by design, enabling us to study zero-shot generalization to non-English languages. We explicitly verify no multi-page PDF document is used both [*ViDoRe*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) and in the train set to prevent evaluation contamination.
36
  A validation set is created with 2% of the samples to tune hyperparameters.
37
 
38
+ *Note: Multilingual data is present in the pretraining corpus of the language model and most probably in the multimodal training.*
39
 
40
  ### Parameters
41