Cyrile commited on
Commit
a7b92e2
·
verified ·
1 Parent(s): 1ee5a9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -9,6 +9,25 @@ language:
9
  pipeline_tag: sentence-similarity
10
  ---
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ## Evaluation
13
 
14
  To assess the performance of the reranker, we will utilize the "validation" split of the [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) dataset. We will select
 
9
  pipeline_tag: sentence-similarity
10
  ---
11
 
12
+ # Bloomz-3b Reranking
13
+
14
+ This reranking model is built from [cmarkea/bloomz-3b-dpo-chat](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) model and aims to gauge the correspondence between
15
+ a question (query) and a context. With its normalized scoring, it facilitates filtering of results derived from query/context matches at the output of a retriever.
16
+ Moreover, it enables the reordering of results using a modeling approach more efficient than the retriever's. However, this modeling type is not conducive to direct
17
+ database searching due to its high computational cost.
18
+
19
+ Developed to be language-agnostic, this model supports both French and English. Consequently, it can effectively score in a cross-language context without being
20
+ influenced by its behavior in a monolingual context (English or French).
21
+
22
+ ## Dataset
23
+ The training dataset comprises the [mMARCO dataset](https://huggingface.co/datasets/unicamp-dl/mmarco), consisting of query/positive/hard negative triplets. Additionally,
24
+ we have included [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) data from the train split, forming query/positive/hard negative triplets. To generate hard
25
+ negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. Hence, the negative observations address the same
26
+ themes as the queries but presumably do not contain the answer to the question.
27
+
28
+ Finally, the triplets are flattened to obtain pairs of query/context sentences with a label of 1 if query/positive and a label 0 if query/negative. In each element of the
29
+ pair (query/context), the language, French or English, is randomly and uniformly chosen.
30
+
31
  ## Evaluation
32
 
33
  To assess the performance of the reranker, we will utilize the "validation" split of the [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) dataset. We will select