Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,13 @@ This model was trained on 100K random triplets of the [mMARCO dataset](https://h
|
|
19 |
mMARCO is the multiligual version of [Microsoft's MARCO dataset](https://microsoft.github.io/msmarco/).
|
20 |
|
21 |
Training used the [Ragatouille library](https://github.com/bclavie/RAGatouille/blob/main/examples/02-basic_training.ipynb) using
|
22 |
-
[Lightning AI](https://lightning.ai/).
|
23 |
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
mMARCO is the multiligual version of [Microsoft's MARCO dataset](https://microsoft.github.io/msmarco/).
|
20 |
|
21 |
Training used the [Ragatouille library](https://github.com/bclavie/RAGatouille/blob/main/examples/02-basic_training.ipynb) using
|
22 |
+
[Lightning AI](https://lightning.ai/).
|
23 |
|
24 |
+
If you downloaded the model before July 15th 1 pm (Jerusalem time), please try the current version.
|
25 |
+
Use the [Ragatouille examples](https://github.com/bclavie/RAGatouille/blob/main/examples/01-basic_indexing_and_search.ipynb) to learn more,
|
26 |
+
just replace the pretrained model name and make sure you use Arabic text and split documents for best results.
|
27 |
+
|
28 |
+
You can train a better model if you have access to adequate compute (can fine tune this model on more data, seed 42 was used tp pick the 100K sample).
|
29 |
+
|
30 |
+
|
31 |
+
Model first announced: https://www.linkedin.com/posts/akhooli_this-is-probably-the-first-arabic-colbert-activity-7217969205197848576-l8Cy
|