cmarkea
/

bloomz-3b-reranking

text-classification

text-generation-inference

Model card Files Files and versions Community

Cyrile commited on Apr 11, 2024

Commit

a7b92e2

·

verified ·

1 Parent(s): 1ee5a9c

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -9,6 +9,25 @@ language:
 pipeline_tag: sentence-similarity
 ---
 ## Evaluation
 To assess the performance of the reranker, we will utilize the "validation" split of the [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) dataset. We will select

 pipeline_tag: sentence-similarity
 ---
+# Bloomz-3b Reranking
+This reranking model is built from [cmarkea/bloomz-3b-dpo-chat](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) model and aims to gauge the correspondence between
+a question (query) and a context. With its normalized scoring, it facilitates filtering of results derived from query/context matches at the output of a retriever.
+Moreover, it enables the reordering of results using a modeling approach more efficient than the retriever's. However, this modeling type is not conducive to direct
+database searching due to its high computational cost.
+Developed to be language-agnostic, this model supports both French and English. Consequently, it can effectively score in a cross-language context without being
+influenced by its behavior in a monolingual context (English or French).
+## Dataset
+The training dataset comprises the [mMARCO dataset](https://huggingface.co/datasets/unicamp-dl/mmarco), consisting of query/positive/hard negative triplets. Additionally,
+we have included [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) data from the train split, forming query/positive/hard negative triplets. To generate hard
+negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. Hence, the negative observations address the same
+themes as the queries but presumably do not contain the answer to the question.
+Finally, the triplets are flattened to obtain pairs of query/context sentences with a label of 1 if query/positive and a label 0 if query/negative. In each element of the
+pair (query/context), the language, French or English, is randomly and uniformly chosen.
 ## Evaluation
 To assess the performance of the reranker, we will utilize the "validation" split of the [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) dataset. We will select