YoannSOLA commited on
Commit
ced2aea
·
verified ·
1 Parent(s): 2137de0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -11,18 +11,18 @@ pipeline_tag: sentence-similarity
11
 
12
  # Bloomz-3b Reranking
13
 
14
- This reranking model is built from [cmarkea/bloomz-3b-dpo-chat](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) model and aims to gauge the correspondence between
15
- a question (query) and a context. With its normalized scoring, it facilitates filtering of results derived from query/context matches at the output of a retriever.
16
- Moreover, it enables the reordering of results using a modeling approach more efficient than the retriever's. However, this modeling type is not conducive to direct
17
  database searching due to its high computational cost.
18
 
19
  Developed to be language-agnostic, this model supports both French and English. Consequently, it can effectively score in a cross-language context without being
20
  influenced by its behavior in a monolingual context (English or French).
21
 
22
  ## Dataset
23
- The training dataset comprises the [mMARCO dataset](https://huggingface.co/datasets/unicamp-dl/mmarco), consisting of query/positive/hard negative triplets. Additionally,
24
- we have included [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) data from the "train" split, forming query/positive/hard negative triplets. To generate hard
25
- negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. Hence, the negative observations address the same
26
  themes as the queries but presumably do not contain the answer to the question.
27
 
28
  Finally, the triplets are flattened to obtain pairs of query/context sentences with a label 1 if query/positive and a label 0 if query/negative. In each element of the
@@ -30,9 +30,9 @@ pair (query and context), the language, French or English, is randomly and unifo
30
 
31
  ## Evaluation
32
 
33
- To assess the performance of the reranker, we will utilize the "validation" split of the [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) dataset. We will select
34
  the first question from each paragraph, along with the paragraph constituting the context that should be ranked Top-1 for an Oracle modeling. What's intriguing is that
35
- the number of themes is limited, and each context from a corresponding theme that does not match the query forms a hard negative (other contexts outside the theme are
36
  simple negatives). Thus, we can construct the following table, with each theme showing the number of contexts and associated query:
37
 
38
  | Theme name | Context number |
@@ -75,7 +75,7 @@ simple negatives). Thus, we can construct the following table, with each theme s
75
 
76
  The evaluation corpus consists of 1204 pairs of query/context to be ranked.
77
 
78
- Initially, the evaluation scores will be calculated in cases where both the query and the context are in the same language (French/French).
79
 
80
  | Model (French/French) | Top-mean | Top-std | Top-1 (%) | Top-10 (%) | Top-100 (%) | MRR (x100) | mean score Top | std score Top |
81
  |:-----------------------------:|:----------:|:---------:|:---------:|:----------:|:-----------:|:----------:|:----------------:|:---------------:|
@@ -88,7 +88,7 @@ Initially, the evaluation scores will be calculated in cases where both the quer
88
  | [cmarkea/bloomz-3b-reranking](https://huggingface.co/cmarkea/bloomz-3b-reranking) | 1.22 | 1.06 | 89.37 | 99.75 | 100 | 93.79 | 0.94 | 0.10 |
89
 
90
 
91
- Next, we evaluate the model in a cross-language context, with queries in French and contexts in English.
92
 
93
  | Model (French/English) | Top-mean | Top-std | Top-1 (%) | Top-10 (%) | Top-100 (%) | MRR (x100) | mean score Top | std score Top |
94
  |:-----------------------------:|:----------:|:---------:|:---------:|:----------:|:-----------:|:----------:|:----------------:|:---------------:|
@@ -100,14 +100,14 @@ Next, we evaluate the model in a cross-language context, with queries in French
100
  | [cmarkea/bloomz-560m-reranking](https://huggingface.co/cmarkea/bloomz-560m-reranking) | 1.51 | 1.92 | 81.89 | 99.09 | 100 | 88.64 | 0.92 | 0.15 |
101
  | [cmarkea/bloomz-3b-reranking](https://huggingface.co/cmarkea/bloomz-3b-reranking) | 1.22 | 0.98 | 89.20 | 99.84 | 100 | 93.63 | 0.94 | 0.10 |
102
 
103
- As observed, the cross-language context does not significantly impact the behavior of our models. If the model is used in a reranking context along with filtering of the
104
  Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
105
  for RAG-type applications.
106
 
107
  How to Use Bloomz-3b-reranking
108
  ------------------------------
109
 
110
- The following example utilizes the API Pipeline of the Transformers library.
111
 
112
  ```python
113
  from transformers import pipeline
 
11
 
12
  # Bloomz-3b Reranking
13
 
14
+ This reranking model is built from [cmarkea/bloomz-3b-dpo-chat](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) model and aims to measure the semantic correspondence between
15
+ a question (query) and a context. With its normalized scoring, it helps to filter the query/context matchings outputted by a retriever in an ODQA (Open-Domain Question Answering)context.
16
+ Moreover, it allows to reorder the results using a more efficient modeling approach than the retriever one. However, this modeling type is not conducive to direct
17
  database searching due to its high computational cost.
18
 
19
  Developed to be language-agnostic, this model supports both French and English. Consequently, it can effectively score in a cross-language context without being
20
  influenced by its behavior in a monolingual context (English or French).
21
 
22
  ## Dataset
23
+ The training dataset is composed of the [mMARCO dataset](https://huggingface.co/datasets/unicamp-dl/mmarco), consisting of query/positive/hard negative triplets. Additionally,
24
+ we have included [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) data from the "train" split, forming query/positive/hard negative triplets. In order to generate hard
25
+ negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. Hence, the negative observations belong to the same
26
  themes as the queries but presumably do not contain the answer to the question.
27
 
28
  Finally, the triplets are flattened to obtain pairs of query/context sentences with a label 1 if query/positive and a label 0 if query/negative. In each element of the
 
30
 
31
  ## Evaluation
32
 
33
+ To assess the performance of the reranker, we will make use of the "validation" split of the [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) dataset. We will select
34
  the first question from each paragraph, along with the paragraph constituting the context that should be ranked Top-1 for an Oracle modeling. What's intriguing is that
35
+ the number of themes is limited, and each context from a corresponding theme that does not match the query is considered as a hard negative (other contexts outside the theme are
36
  simple negatives). Thus, we can construct the following table, with each theme showing the number of contexts and associated query:
37
 
38
  | Theme name | Context number |
 
75
 
76
  The evaluation corpus consists of 1204 pairs of query/context to be ranked.
77
 
78
+ Firstly, the evaluation scores were computed in cases where both the query and the context are in the same language (French/French).
79
 
80
  | Model (French/French) | Top-mean | Top-std | Top-1 (%) | Top-10 (%) | Top-100 (%) | MRR (x100) | mean score Top | std score Top |
81
  |:-----------------------------:|:----------:|:---------:|:---------:|:----------:|:-----------:|:----------:|:----------------:|:---------------:|
 
88
  | [cmarkea/bloomz-3b-reranking](https://huggingface.co/cmarkea/bloomz-3b-reranking) | 1.22 | 1.06 | 89.37 | 99.75 | 100 | 93.79 | 0.94 | 0.10 |
89
 
90
 
91
+ Then, we evaluated the model in a cross-language context, with queries in French and contexts in English.
92
 
93
  | Model (French/English) | Top-mean | Top-std | Top-1 (%) | Top-10 (%) | Top-100 (%) | MRR (x100) | mean score Top | std score Top |
94
  |:-----------------------------:|:----------:|:---------:|:---------:|:----------:|:-----------:|:----------:|:----------------:|:---------------:|
 
100
  | [cmarkea/bloomz-560m-reranking](https://huggingface.co/cmarkea/bloomz-560m-reranking) | 1.51 | 1.92 | 81.89 | 99.09 | 100 | 88.64 | 0.92 | 0.15 |
101
  | [cmarkea/bloomz-3b-reranking](https://huggingface.co/cmarkea/bloomz-3b-reranking) | 1.22 | 0.98 | 89.20 | 99.84 | 100 | 93.63 | 0.94 | 0.10 |
102
 
103
+ As observed, the cross-language context does not significantly impact the behavior of our models. If the model were used in a context of reranking and filtering the
104
  Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
105
  for RAG-type applications.
106
 
107
  How to Use Bloomz-3b-reranking
108
  ------------------------------
109
 
110
+ The following example is based on the API Pipeline of the Transformers library.
111
 
112
  ```python
113
  from transformers import pipeline