RichardErkhov
commited on
Commit
•
f359f04
1
Parent(s):
33505b8
uploaded readme
Browse files
README.md
ADDED
@@ -0,0 +1,195 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Quantization made by Richard Erkhov.
|
2 |
+
|
3 |
+
[Github](https://github.com/RichardErkhov)
|
4 |
+
|
5 |
+
[Discord](https://discord.gg/pvy7H8DZMG)
|
6 |
+
|
7 |
+
[Request more models](https://github.com/RichardErkhov/quant_request)
|
8 |
+
|
9 |
+
|
10 |
+
zephyr-orpo-141b-A35b-v0.1 - GGUF
|
11 |
+
- Model creator: https://huggingface.co/HuggingFaceH4/
|
12 |
+
- Original model: https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/
|
13 |
+
|
14 |
+
|
15 |
+
| Name | Quant method | Size |
|
16 |
+
| ---- | ---- | ---- |
|
17 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q2_K | 48.52GB |
|
18 |
+
| [zephyr-orpo-141b-A35b-v0.1.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_XS | 54.23GB |
|
19 |
+
| [zephyr-orpo-141b-A35b-v0.1.IQ3_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_S | 57.27GB |
|
20 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_S | 57.27GB |
|
21 |
+
| [zephyr-orpo-141b-A35b-v0.1.IQ3_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_M | 60.06GB |
|
22 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K | 63.13GB |
|
23 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_M | 63.13GB |
|
24 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_L | 67.6GB |
|
25 |
+
| [zephyr-orpo-141b-A35b-v0.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ4_XS | 71.11GB |
|
26 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_0 | 74.05GB |
|
27 |
+
| [zephyr-orpo-141b-A35b-v0.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ4_NL | 74.95GB |
|
28 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K_S | 74.95GB |
|
29 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K | 79.71GB |
|
30 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K_M | 79.71GB |
|
31 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_1 | 82.18GB |
|
32 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_0 | 90.31GB |
|
33 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K_S | 90.31GB |
|
34 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K | 93.1GB |
|
35 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K_M | 93.1GB |
|
36 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_1 | 98.45GB |
|
37 |
+
| [zephyr-orpo-141b-A35b-v0.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q6_K | 107.6GB |
|
38 |
+
|
39 |
+
|
40 |
+
|
41 |
+
|
42 |
+
Original model description:
|
43 |
+
---
|
44 |
+
license: apache-2.0
|
45 |
+
base_model: mistral-community/Mixtral-8x22B-v0.1
|
46 |
+
tags:
|
47 |
+
- trl
|
48 |
+
- orpo
|
49 |
+
- generated_from_trainer
|
50 |
+
datasets:
|
51 |
+
- argilla/distilabel-capybara-dpo-7k-binarized
|
52 |
+
model-index:
|
53 |
+
- name: zephyr-orpo-141b-A35b-v0.1
|
54 |
+
results: []
|
55 |
+
inference:
|
56 |
+
parameters:
|
57 |
+
temperature: 0.7
|
58 |
+
---
|
59 |
+
|
60 |
+
<img src="https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/resolve/main/logo.png" alt="Zephyr 141B Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
61 |
+
|
62 |
+
|
63 |
+
# Model Card for Zephyr 141B-A39B
|
64 |
+
|
65 |
+
Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 141B-A39B is the latest model in the series, and is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) that was trained using a novel alignment algorithm called [Odds Ratio Preference Optimization (ORPO)](https://huggingface.co/papers/2403.07691) with **7k instances** for **1.3 hours** on 4 nodes of 8 x H100s. ORPO does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO. To train Zephyr-141B-A39B, we used the [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized) preference dataset, which consists of synthetic, high-quality, multi-turn preferences that have been scored via LLMs.
|
66 |
+
|
67 |
+
> [!NOTE]
|
68 |
+
> This model was trained collaboratively between Argilla, KAIST, and Hugging Face
|
69 |
+
|
70 |
+
## Model Details
|
71 |
+
|
72 |
+
### Model Description
|
73 |
+
|
74 |
+
<!-- Provide a longer summary of what this model is. -->
|
75 |
+
|
76 |
+
- **Model type:** A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters. (We initially made a small error in calculating the number of active parameters for the model ID. The model card states the correct number.) Fine-tuned on a mix of publicly available, synthetic datasets.
|
77 |
+
- **Language(s) (NLP):** Primarily English.
|
78 |
+
- **License:** Apache 2.0
|
79 |
+
- **Finetuned from model:** [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)
|
80 |
+
|
81 |
+
### Model Sources
|
82 |
+
|
83 |
+
<!-- Provide the basic links for the model. -->
|
84 |
+
|
85 |
+
- **Repository:** https://github.com/huggingface/alignment-handbook
|
86 |
+
- **Dataset:** https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized
|
87 |
+
|
88 |
+
## Performance
|
89 |
+
|
90 |
+
Zephyr 141B-A39B was trained to test the effectiveness of ORPO at scale and the underlying dataset contains a mix of general chat capabilities. It achieves strong performance on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [IFEval](https://arxiv.org/abs/2311.07911). The scores reported below were obtained using the [LightEval](https://github.com/huggingface/lighteval) evaluation suite and each prompt has been formatted with the model's corresponding chat template to simulate real-world usage. This is why some scores may differ from those reported in technical reports or on the Open LLM Leaderboard.
|
91 |
+
|
92 |
+
| Model | MT Bench | IFEval | BBH | AGIEval |
|
93 |
+
|-----------------------------------------------------------------------------------------------------|---------:|-------:|------:|--------:|
|
94 |
+
| [zephyr-orpo-141b-A39b-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1) | 8.17 | 65.06 | 58.96 | 44.16 |
|
95 |
+
| [databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct) | 8.26 | 52.13 | 48.50 | 41.16 |
|
96 |
+
| [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) | 8.30 | 55.08 | 45.31 | 47.68 |
|
97 |
+
|
98 |
+
|
99 |
+
## Intended uses & limitations
|
100 |
+
|
101 |
+
The model was fine-tuned on a blend of chat, code, math, and reasoning data. Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
|
102 |
+
|
103 |
+
```python
|
104 |
+
# pip install 'transformers>=4.39.3'
|
105 |
+
# pip install accelerate
|
106 |
+
|
107 |
+
import torch
|
108 |
+
from transformers import pipeline
|
109 |
+
|
110 |
+
pipe = pipeline(
|
111 |
+
"text-generation",
|
112 |
+
model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
|
113 |
+
device_map="auto",
|
114 |
+
torch_dtype=torch.bfloat16,
|
115 |
+
)
|
116 |
+
messages = [
|
117 |
+
{
|
118 |
+
"role": "system",
|
119 |
+
"content": "You are Zephyr, a helpful assistant.",
|
120 |
+
},
|
121 |
+
{"role": "user", "content": "Explain how Mixture of Experts work in language a child would understand."},
|
122 |
+
]
|
123 |
+
outputs = pipe(
|
124 |
+
messages,
|
125 |
+
max_new_tokens=512,
|
126 |
+
do_sample=True,
|
127 |
+
temperature=0.7,
|
128 |
+
top_k=50,
|
129 |
+
top_p=0.95,
|
130 |
+
)
|
131 |
+
print(outputs[0]["generated_text"][-1]["content"])
|
132 |
+
```
|
133 |
+
|
134 |
+
## Bias, Risks, and Limitations
|
135 |
+
|
136 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
137 |
+
|
138 |
+
Zephyr 141B-A39B has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
|
139 |
+
It is also unknown what the size and composition of the corpus was used to train the base model (`mistral-community/Mixtral-8x22B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
|
140 |
+
|
141 |
+
|
142 |
+
## Training procedure
|
143 |
+
|
144 |
+
### Training hyperparameters
|
145 |
+
|
146 |
+
The following hyperparameters were used during training:
|
147 |
+
- learning_rate: 5e-06
|
148 |
+
- train_batch_size: 1
|
149 |
+
- eval_batch_size: 8
|
150 |
+
- seed: 42
|
151 |
+
- distributed_type: multi-GPU
|
152 |
+
- num_devices: 32
|
153 |
+
- total_train_batch_size: 32
|
154 |
+
- total_eval_batch_size: 256
|
155 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
156 |
+
- lr_scheduler_type: inverse_sqrt
|
157 |
+
- lr_scheduler_warmup_steps: 100
|
158 |
+
- num_epochs: 3
|
159 |
+
|
160 |
+
|
161 |
+
### Framework versions
|
162 |
+
|
163 |
+
- Transformers 4.39.3
|
164 |
+
- Pytorch 2.1.2+cu121
|
165 |
+
- Datasets 2.18.0
|
166 |
+
- Tokenizers 0.15.1
|
167 |
+
|
168 |
+
## Citation
|
169 |
+
|
170 |
+
If you find Zephyr 141B-A39B is useful in your work, please cite the ORPO paper:
|
171 |
+
|
172 |
+
```
|
173 |
+
@misc{hong2024orpo,
|
174 |
+
title={ORPO: Monolithic Preference Optimization without Reference Model},
|
175 |
+
author={Jiwoo Hong and Noah Lee and James Thorne},
|
176 |
+
year={2024},
|
177 |
+
eprint={2403.07691},
|
178 |
+
archivePrefix={arXiv},
|
179 |
+
primaryClass={cs.CL}
|
180 |
+
}
|
181 |
+
```
|
182 |
+
|
183 |
+
You may also wish to cite the creators of this model:
|
184 |
+
|
185 |
+
```
|
186 |
+
@misc{zephyr_141b,
|
187 |
+
author = {Alvaro Bartolome and Jiwoo Hong and Noah Lee and Kashif Rasul and Lewis Tunstall},
|
188 |
+
title = {Zephyr 141B A39B},
|
189 |
+
year = {2024},
|
190 |
+
publisher = {Hugging Face},
|
191 |
+
journal = {Hugging Face repository},
|
192 |
+
howpublished = {\url{https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1}}
|
193 |
+
}
|
194 |
+
```
|
195 |
+
|