File size: 4,257 Bytes
10b62a7
 
 
 
 
 
 
 
 
 
 
 
aa11d9d
 
10b62a7
 
 
 
f3a7cc4
 
10b62a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f3a7cc4
10b62a7
f3a7cc4
10b62a7
f3a7cc4
10b62a7
f3a7cc4
10b62a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
tags:
- generated_from_trainer
license: mit
language:
- en
base_model: mistralai/Mistral-7B-v0.1
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

<img src="https://huggingface.co/castorini/rank_zephyr_7b_v1_full/resolve/main/thumbnail.jpeg" alt="RankZephyr Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
<!-- <img src="https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/resolve/main/thumbnail.png" alt="Zephyr Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> -->


# Model Card for RankZephyr 7B V1 - Full

RankZephyr is a series of language models trained to act as helpful reranking assistants built on the Zephyr-7B-β model.
RankZephyr Base is the model that follows single-stage fine-tuning on the RankGPT-3.5 model, while RankZephyr Full is the model that is further fine-tuned on RankGPT-4 reorderings of OpenAI's Ada2 orderings for 5K queries.


## Model description

- **Model type:** A 7B parameter GPT-like model initially fine-tuned on a mix of publicly available, synthetic datasets, followed by task-specific listwise reranking data.
- **Language(s) (NLP):** Primarily English
- **License:** MIT
- **Fine-tuned from model:** [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/castorini/rank_llm
- **Paper:** https://arxiv.org/abs/2312.02724

## Effectiveness

At the time of release, RankZephyr-7B-Full is the state-of-the-art open-source reranking model on various datasets like DL19/20/21/22 and TREC-COVID and TREC-News.

With the MS MARCO v1 collection:

| Model | Size | First Stage | DL19 | DL20|
|-------------|-----|----|---------------|--------------|
| **RankZephyr-7b-v1-full-rho** 🪁 | **7B** | **SPLADE++ ED** | **0.7855** | **0.8255** |
| **RankZephyr-7b-v1-full** 🪁 | **7B** | **SPLADE++ ED** | **0.7803** | **0.8211** |
| RankGPT-4 (PSC)  |  -| SPLADE++ ED | 0.7601 | 0.7514 |
| RankGPT-4  |  -| SPLADE++ ED  | 0.7464 | 0.7076 |
| **RankZephyr-7b-v1-base** 🪁 | **7B** | **SPLADE++ ED** | **0.7341** | **0.7213** |
| RankGPT-3.5 |  -| SPLADE++ ED  | 0.7504 |  0.7120|


More details can be found in the paper.

## Intended uses & limitations

The model is to be used in conjunction with the [RankLLM repository](https://github.com/castorini/rank_llm). While `rank-llm` exists as a PyPI package, we are currently in the early stages of development and encourage users to directly check install from source.

The original Zephyr model is trained for chat. In our case, RankZephyr is fine-tuned to act as a listwise reranking agent. You provide it with a query and documents and get back a reordered list of document identifiers.


## Bias, Risks, and Limitations

The following is an excerpt from the [Zephyr-7B-β model card](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/README.md#bias-risks--limitations):

<!-- This section is meant to convey both technical and sociotechnical limitations. -->


> Zephyr-7B-β has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).  It is also unknown what the size and composition of the corpus was used to train the base model (`mistralai/Mistral-7B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.

Our model is trained specifically on monolingual English data, effectiveness on multilingual sets is not guaranteed.


## Citation

If you find RankZephyr is useful in your work, please cite the following paper:

```
@ARTICLE{pradeep2023rankzephyr,
  title   = {{RankZephyr}: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!},
  author  = {Ronak Pradeep and Sahel Sharifymoghaddam and Jimmy Lin},
  year    = {2023},
  journal = {arXiv:2312.02724}
}
```