File size: 20,363 Bytes
9bc9a08
12a5102
9bc9a08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c973b2c
 
 
 
 
 
 
 
 
9bc9a08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
baaa070
 
 
9bc9a08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
665a9be
 
 
 
 
 
 
 
 
 
9bc9a08
 
 
 
 
 
 
 
 
 
 
665a9be
 
9bc9a08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
665a9be
 
 
9bc9a08
 
 
 
 
 
665a9be
9bc9a08
 
665a9be
9bc9a08
 
 
 
 
 
665a9be
 
 
 
 
 
 
 
 
 
 
 
 
 
9bc9a08
 
7473d37
 
 
 
 
 
 
 
 
 
bac849b
 
 
 
 
 
 
 
 
 
 
 
9bc9a08
 
87e72cf
c973b2c
9bc9a08
 
e1eac59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9bc9a08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
---
license: gemma
pipeline_tag: text-classification
tags:
- transformers
- sentence-transformers
language:
- multilingual
---

# Reranker

**More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/tree/master).**

- [Model List](#model-list)
- [Usage](#usage)
- [Fine-tuning](#fine-tune)
- [Evaluation](#evaluation)
- [Citation](#citation)

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. 
You can get a relevance score by inputting query and passage to the reranker. 
And the score can be mapped to a float value in [0,1] by sigmoid function.

Here, we introduce a lightweight reranker **bge-reranker-v2.5-gemma2-lightweight**, which is a multilingual model trained based on gemma2-9b. By integrating token compression capabilities and layerwise reduction, the model can maintain outstanding performance while saving significant resources.

Our model primarily demonstrates the following capabilities:

- Lightweight: The model can be made lightweight through token compression, layerwise reduction, or a combination of both.
- Outstanding performance: The model has achieved new state-of-the-art (SOTA) performance on both BEIR and MIRACL.

We will release a technical report about lightweight reranker soon with more details.

------

You can use **bge-reranker-v2.5-gemma2-lightweight** with the following different prompts:

- Predict whether passage B contains an answer to query A.
- Predict whether passages A and B have the same meaning.
- Predict whether queries A and B are asking the same thing.
- Predict whether argument A and counterargument B express contradictory opinions.


## Model List

| Model                                                                     | Base model                                                           | Language | layerwise | compress ratio | compress layers |                           feature                            |
|:--------------------------------------------------------------------------|:--------:|:-----------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------|
| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) | Chinese and English |     -     |     -     |     -     | Lightweight reranker model, easy to deploy, with fast inference. |
| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) | Chinese and English |     -     |     -     |     -     | Lightweight reranker model, easy to deploy, with fast inference. |
| [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | [bge-m3](https://huggingface.co/BAAI/bge-m3) |    Multilingual     |     -     |     -     |     -     | Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference. |
| [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) |      [gemma-2b](https://huggingface.co/google/gemma-2b)      |    Multilingual     |     -     |     -     |     -     | Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities. |
| [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise) | [MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16) |    Multilingual     |   8-40    |   -   |   -   | Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference. |
| [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) | [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) | Multilingual | 8-42 | 1, 2, 4, 8 | [8, 16, 24, 32, 40] | Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers, compress ratio and compress layers for output, facilitating accelerated inference. |


You can select the model according your senario and resource. 
- For **multilingual**, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3), [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) and [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight)

- For **Chinese or English**, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise). 

- For **efficiency**, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and the low layer of [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise). 

- For better performance, recommand [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise) and [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma)

## Usage 
### Using FlagEmbedding

```
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding
pip install -e .
```

#### For LLM-based lightweight reranker

```python
from FlagEmbedding import LightWeightFlagLLMReranker
reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layer=[24, 40]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
print(score)

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28], compress_ratio=2, compress_layer=[24, 40])
print(scores)
```

### Using Huggingface transformers

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def last_logit_pool(logits: torch.Tensor,
                    attention_mask: torch.Tensor) -> torch.Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return logits[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = logits.shape[0]
        return torch.stack([logits[i, sequence_lengths[i]] for i in range(batch_size)], dim=0)

def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
    if prompt is None:
        prompt = "Predict whether passage B contains an answer to query A."
    sep = "\n"
    prompt_inputs = tokenizer(prompt,
                              return_tensors=None,
                              add_special_tokens=False)['input_ids']
    sep_inputs = tokenizer(sep,
                           return_tensors=None,
                           add_special_tokens=False)['input_ids']
    inputs = []
    query_lengths = []
    prompt_lengths = []
    for query, passage in pairs:
        query_inputs = tokenizer(f'A: {query}',
                                 return_tensors=None,
                                 add_special_tokens=False,
                                 max_length=max_length * 3 // 4,
                                 truncation=True)
        passage_inputs = tokenizer(f'B: {passage}',
                                   return_tensors=None,
                                   add_special_tokens=False,
                                   max_length=max_length,
                                   truncation=True)
        item = tokenizer.prepare_for_model(
            [tokenizer.bos_token_id] + query_inputs['input_ids'],
            sep_inputs + passage_inputs['input_ids'],
            truncation='only_second',
            max_length=max_length,
            padding=False,
            return_attention_mask=False,
            return_token_type_ids=False,
            add_special_tokens=False
        )
        item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs
        item['attention_mask'] = [1] * len(item['input_ids'])
        inputs.append(item)
        query_lengths.append(len([tokenizer.bos_token_id] + query_inputs['input_ids'] + sep_inputs))
        prompt_lengths.append(len(sep_inputs + prompt_inputs))
        
    return tokenizer.pad(
            inputs,
            padding=True,
            max_length=max_length + len(sep_inputs) + len(prompt_inputs),
            pad_to_multiple_of=8,
            return_tensors='pt',
    ), query_lengths, prompt_lengths

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2.5-gemma2-lightweight', trust_remote_code=True)
tokenizer.padding_side = 'right'
model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2.5-gemma2-lightweight', trust_remote_code=True)
model = model.to('cuda')
model.eval()

pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
    inputs, query_lengths, prompt_lengths = get_inputs(pairs, tokenizer)
    inputs = inputs.to(model.device)
    outputs = model(**inputs,
                    return_dict=True,
                    cutoff_layers=[28],
                    compress_ratio=2,
                    compress_layer=[24, 40],
                    query_lengths=query_lengths,
                    prompt_lengths=prompt_lengths)
    scores = []
    for i in range(len(outputs.logits)):
        logits = last_logit_pool(outputs.logits[i], outputs.attention_masks[i])
        scores.append(logits.cpu().float().tolist())
    print(scores)
```

## Infinity:

For an OpenAI API-compatible local deploment and [Infinity](https://github.com/michaelfeil/infinity)

```
docker run -it --gpus all -v $volume:/app/.cache -p 7997:7997 \
 michaelf34/infinity:0.0.70 \
 v2 infinity_emb v2 --model-id BAAI/bge-reranker-v2.5-gemma2-lightweight --device cuda --no-bettertransformer
```

## Load model in local

1. make sure `gemma_config.py` and `gemma_model.py` from [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight/tree/main) in your local path.
2. modify the following part of config.json:
```
"auto_map": {
    "AutoConfig": "gemma_config.CostWiseGemmaConfig",
    "AutoModel": "gemma_model.CostWiseGemmaModel",
    "AutoModelForCausalLM": "gemma_model.CostWiseGemmaForCausalLM"
  },
```

## Evaluation

The configuration of saving 60% Flops is: `compress_ratios=2`, `compress_layer=[8]`, `cutoff_layers=[25]`.

- **BEIR:**

|        BEIR        | bge-large-en-v1.5 | Bge-rearanker v2 m3 | jina-reranker-v2-base-multilingual | bge-reranker-v2-gemma | bge-reranker-v2.5-gemma2-lightweight | bge-reranker-v2.5-gemma2-lightweight |
| :----------------: | :---------------: | :-----------------: | :--------------------------------: | :-------------------: | :----------------------------------: | :----------------------------------: |
| **Save** **Flops** |         -         |          -          |                 -                  |           -           |                 60%                  |                  0                   |
|    **ArguAna**     |       63.54       |        37.7         |               52.23                |         78.68         |                86.04                 |                86.16                 |
|  **ClimateFEVER**  |       36.49       |        37.99        |               34.65                |         39.07         |                48.41                 |                48.48                 |
|      **CQA**       |       42.23       |        38.24        |               40.21                |         45.85         |                49.18                 |                 48.9                 |
|    **DBPedia**     |       44.16       |        48.15        |               49.31                |         49.92         |                51.98                 |                52.11                 |
|     **FEVER**      |       87.17       |        90.15        |               92.44                |         90.15         |                94.71                 |                94.69                 |
|    **FiQA2018**    |       44.97       |        49.32        |               45.88                |         49.32         |                60.48                 |                60.95                 |
|    **HotpotQA**    |       74.11       |        84.51        |               81.81                |         86.15         |                87.84                 |                87.89                 |
|    **MSMARCO**     |       42.48       |        47.79        |               47.83                |         48.07         |                47.23                 |                47.26                 |
|    **NFCorpus**    |       38.12       |        34.85        |               37.73                |         39.73         |                 41.4                 |                41.64                 |
|       **NQ**       |       55.04       |        69.37        |               67.35                |         72.6          |                75.37                 |                75.58                 |
| **QuoraRetrieval** |       89.06       |        89.13        |               87.81                |         90.37         |                91.25                 |                91.18                 |
|    **SCIDOCS**     |       22.62       |        18.25        |               20.21                |         21.65         |                23.71                 |                23.87                 |
|    **SciFact**     |       74.64       |        73.08        |               76.93                |         77.22         |                 80.5                 |                80.38                 |
|   **Touche2020**   |       25.08       |        35.68        |               32.45                |         35.68         |                30.64                 |                31.09                 |
|   **TRECCOVID**    |       74.89       |        83.39        |               80.89                |         85.51         |                84.26                 |                84.85                 |
|      **Mean**      |       54.31       |        55.36        |               56.52                |         60.71         |                 63.1                 |              **63.67**               |

|        BEIR        | e5-mistral-7b-instruct | bge-reranker-v2-gemma | bge-reranker-v2.5-gemma-lightweight | bge-reranker-v2.5-gemma-lightweight |
| :----------------: | :--------------------: | :-------------------: | :---------------------------------: | :---------------------------------: |
|   **Save Flops**   |           -            |           -           |                 60%                 |                  0                  |
|    **ArguAna**     |          61.8          |         79.05         |                86.02                |                86.58                |
|  **ClimateFEVER**  |         38.37          |         37.66         |                47.27                |                47.13                |
|      **CQA**       |         42.97          |         46.16         |                49.06                |                49.53                |
|    **DBPedia**     |         48.84          |         50.77         |                52.45                |                52.87                |
|     **FEVER**      |         87.82          |         91.36         |                94.85                |                95.19                |
|    **FiQA2018**    |         56.58          |         50.96         |                58.81                |                61.19                |
|    **HotpotQA**    |         75.72          |         86.99         |                88.49                |                88.82                |
|    **MSMARCO**     |         43.06          |         48.35         |                47.65                |                47.4                 |
|    **NFCorpus**    |         38.58          |         39.25         |                42.28                |                42.17                |
|       **NQ**       |         63.56          |         73.44         |                 75                  |                76.28                |
| **QuoraRetrieval** |         89.59          |         90.44         |                91.09                |                91.18                |
|    **SCIDOCS**     |          16.3          |         20.77         |                22.2                 |                22.69                |
|    **SciFact**     |         76.26          |         77.78         |                79.94                |                80.98                |
|   **Touche2020**   |         26.24          |         35.79         |                28.69                |                31.17                |
|   **TRECCOVID**    |         87.07          |         88.13         |                86.61                |                87.36                |
|      **Mean**      |         56.85          |         61.13         |                63.36                |              **64.04**              |

- **MIRACL**:

|          MIRACL (dev, nDCG@10)           | Average (18) | save flops |  ar  |  bn  |  en  |  es  |  fa  |  fi  |  fr  |  hi  |  id  |  ja  |  ko  |  ru  |  sw  |  te  |  th  |  zh  |  de  |  yo  |
| :--------------------------------------: | :----------: | :--------: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
|            **bge-m3 (Dense)**            |     69.2     |     -      | 78.4 | 80.0 | 56.9 | 56.1 | 60.9 | 78.6 | 58.3 | 59.5 | 56.1 | 72.8 | 69.9 | 70.1 | 78.7 | 86.2 | 82.6 | 62.7 | 56.7 | 81.8 |
|  **jina-reranker-v2-base-multilingual**  |     69.6     |     -      | 73.4 | 81.9 | 58.9 | 58.6 | 60.5 | 77.2 | 56.1 | 62.7 | 59.6 | 72.7 | 74.0 | 67.1 | 78.1 | 85.8 | 81.2 | 63.0 | 58.2 | 84.2 |
|          **bge-reranker-v2-m3**          |     74.4     |     -      | 81.7 | 84.6 | 63.5 | 64.4 | 65.7 | 82.4 | 63.7 | 68.5 | 62.7 | 80.0 | 73.8 | 76.9 | 82.3 | 89.4 | 85.3 | 65.2 | 62.7 | 87.4 |
|        **bge-reranker-v2-gemma**         |     75.0     |     -      | 82.3 | 85.0 | 66.6 | 65.3 | 65.5 | 82.6 | 65.4 | 69.4 | 61.2 | 79.7 | 75.1 | 78.3 | 81.8 | 89.6 | 86.1 | 66.8 | 64.0 | 85.9 |
| **bge-reranker-v2.5-gemma2-lightweight** |     77.1     |    60%     | 82.5 | 87.8 | 68.6 | 67.6 | 67.5 | 82.8 | 68.5 | 71.4 | 63.8 | 82.8 | 75.9 | 79.8 | 84.8 | 90.8 | 88.1 | 69.9 | 65.8 | 89.6 |
| **bge-reranker-v2.5-gemma-lightweight**  |   **77.3**   |     0      | 82.8 | 87.6 | 69.3 | 67.8 | 67.4 | 83.3 | 68.5 | 71.3 | 63.8 | 83.6 | 75.7 | 80.1 | 85.1 | 90.8 | 88.7 | 69.9 | 65.6 | 89.8 |



## Citation

If you find this repository useful, please consider giving a star and citation

```bibtex
@misc{li2023making,
      title={Making Large Language Models A Better Foundation For Dense Retrieval}, 
      author={Chaofan Li and Zheng Liu and Shitao Xiao and Yingxia Shao},
      year={2023},
      eprint={2312.15503},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{chen2024bge,
      title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation}, 
      author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
      year={2024},
      eprint={2402.03216},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```