cmarkea
/

bloomz-560m-retriever

@@ -1,14 +1,43 @@
 ---
 license: bigscience-bloom-rail-1.0
-datasets:
-- squad
 language:
 - fr
 - en
 pipeline_tag: sentence-similarity
 ---
 ```python
 import numpy as np

 ---
 license: bigscience-bloom-rail-1.0
 language:
 - fr
 - en
 pipeline_tag: sentence-similarity
 ---
+Blommz-560m-retriever
+Introducing Bloomz-560m-retriever based on the Bloomz-560m-sft-chat model. This model enables the creation of an embedding representation of text and queries for a retrieval task, linking queries to documents. The model is designed to be cross-language, meaning it is language-agnostic (English/French). This model is ideal for Open Domain Question Answering (ODQA), projecting queries and text with an algebraic structure to bring them closer together.
+Training
+It is a bi-encoder trained on a corpus of context/query pairs, with 50% in English and 50% in French. The language distribution for queries and contexts is evenly split (1/4 French-French, 1/4 French-English, 1/4 English-French, 1/4 English-English). The learning objective is to bring the embedding representation of queries and associated contexts closer using a contrastive method. The loss function is defined as [rr]:
+Benchmark
+Based on the SQuAD evaluation dataset (comprising 6000 queries distributed over 1200 contexts grouped into 35 themes), we compare the performance in terms of the average top contexter value for a query, the standard deviation of the average top, and the percentage of correct queries within the top-1, top-5, and top-10. We compare the model with a TF-IDF trained on the SQuAD train sub-dataset, DistilCamemBERT, Sentence-BERT, and finally our model. We observe these performances in both monolingual and cross-language contexts (query in French and context in English).
+ Model (FR/FR)                                                                                        | Top-mean | Top-std | Top-1 (%) | Top-5 (%) | Top-10 (%) |
+|-----------------------------------------------------------------------------------------------------|----------|:-------:|-----------|-----------|------------|
+| TF-IDF                                                                                              | 128      | 269     | 23        | 46        | 56         |
+| [CamemBERT](https://huggingface.co/camembert/camembert-base)                                        | 417      | 347     | 1         | 2         | 3          |
+| [Sentence-BERT](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) | 11       | 41      | 43        | 71        | 82         |
+| Bloomz-560m-retriever                                                                               | 10       | 47      | 51        | 78        | 86         |
+| Bloomz-3b-retriever                                                                                 | 9        | 37      | 50        | 79        | 87         |
+Model (EN/FR)                                                                                        | Top-mean | Top-std | Top-1 (%) | Top-5 (%) | Top-10 (%) |
+|-----------------------------------------------------------------------------------------------------|----------|:-------:|-----------|-----------|------------|
+| TF-IDF                                                                                              | 607      | 334     | 0         | 0         | 0         |
+| [CamemBERT](https://huggingface.co/camembert/camembert-base)                                        | 432      | 345     | 0         | 1         | 1          |
+| [Sentence-BERT](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2) | 12       | 47      | 44        | 73        | 83         |
+| Bloomz-560m-retriever                                                                               | 10       | 44      | 49        | 77        | 86         |
+| Bloomz-3b-retriever                                                                                 | 9        | 38      | 50        | 78        | 87         |
+How to Use Blommz-560m-retriever
+The following example utilizes the API Pipeline of the Transformers library.
 ```python
 import numpy as np