cmarkea
/

bloomz-560m-retriever

Feature Extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Cyrile commited on Dec 3, 2023

Commit

a13cc2a

•

1 Parent(s): 644c00f

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -39,6 +39,9 @@ Model (EN/FR)
 | [Bloomz-560m-retriever](https://huggingface.co/cmarkea/bloomz-560m-retriever)                       | 10       | 44      | 49        | 77        | 86         |
 | [Bloomz-3b-retriever](https://huggingface.co/cmarkea/bloomz-3b-retriever)                           | 9        | 38      | 50        | 78        | 87         |
 How to Use Blommz-560m-retriever
 --------------------------------

 | [Bloomz-560m-retriever](https://huggingface.co/cmarkea/bloomz-560m-retriever)                       | 10       | 44      | 49        | 77        | 86         |
 | [Bloomz-3b-retriever](https://huggingface.co/cmarkea/bloomz-3b-retriever)                           | 9        | 38      | 50        | 78        | 87         |
+It is observed that TF-IDF loses robustness in cross-language scenarios (even showing lower performance than CamemBERT, which is a model specialized in French). This can be explained by the fact that a bag-of-words method cannot support this type of issue because, for a given sentence between two languages, the latent vectors will be significantly different.
+CamemBERT exhibits poor performance, not because it poorly groups contexts and queries by themes, but because a meta-cluster appears, separating contexts and queries (as illustrated in the image below), making this type of modeling inappropriate in a retriever context.
 How to Use Blommz-560m-retriever
 --------------------------------