RichardErkhov commited on
Commit
f359f04
1 Parent(s): 33505b8

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +195 -0
README.md ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ zephyr-orpo-141b-A35b-v0.1 - GGUF
11
+ - Model creator: https://huggingface.co/HuggingFaceH4/
12
+ - Original model: https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [zephyr-orpo-141b-A35b-v0.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q2_K | 48.52GB |
18
+ | [zephyr-orpo-141b-A35b-v0.1.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_XS | 54.23GB |
19
+ | [zephyr-orpo-141b-A35b-v0.1.IQ3_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_S | 57.27GB |
20
+ | [zephyr-orpo-141b-A35b-v0.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_S | 57.27GB |
21
+ | [zephyr-orpo-141b-A35b-v0.1.IQ3_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_M | 60.06GB |
22
+ | [zephyr-orpo-141b-A35b-v0.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K | 63.13GB |
23
+ | [zephyr-orpo-141b-A35b-v0.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_M | 63.13GB |
24
+ | [zephyr-orpo-141b-A35b-v0.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_L | 67.6GB |
25
+ | [zephyr-orpo-141b-A35b-v0.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ4_XS | 71.11GB |
26
+ | [zephyr-orpo-141b-A35b-v0.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_0 | 74.05GB |
27
+ | [zephyr-orpo-141b-A35b-v0.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ4_NL | 74.95GB |
28
+ | [zephyr-orpo-141b-A35b-v0.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K_S | 74.95GB |
29
+ | [zephyr-orpo-141b-A35b-v0.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K | 79.71GB |
30
+ | [zephyr-orpo-141b-A35b-v0.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K_M | 79.71GB |
31
+ | [zephyr-orpo-141b-A35b-v0.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_1 | 82.18GB |
32
+ | [zephyr-orpo-141b-A35b-v0.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_0 | 90.31GB |
33
+ | [zephyr-orpo-141b-A35b-v0.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K_S | 90.31GB |
34
+ | [zephyr-orpo-141b-A35b-v0.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K | 93.1GB |
35
+ | [zephyr-orpo-141b-A35b-v0.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K_M | 93.1GB |
36
+ | [zephyr-orpo-141b-A35b-v0.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_1 | 98.45GB |
37
+ | [zephyr-orpo-141b-A35b-v0.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q6_K | 107.6GB |
38
+
39
+
40
+
41
+
42
+ Original model description:
43
+ ---
44
+ license: apache-2.0
45
+ base_model: mistral-community/Mixtral-8x22B-v0.1
46
+ tags:
47
+ - trl
48
+ - orpo
49
+ - generated_from_trainer
50
+ datasets:
51
+ - argilla/distilabel-capybara-dpo-7k-binarized
52
+ model-index:
53
+ - name: zephyr-orpo-141b-A35b-v0.1
54
+ results: []
55
+ inference:
56
+ parameters:
57
+ temperature: 0.7
58
+ ---
59
+
60
+ <img src="https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/resolve/main/logo.png" alt="Zephyr 141B Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
61
+
62
+
63
+ # Model Card for Zephyr 141B-A39B
64
+
65
+ Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 141B-A39B is the latest model in the series, and is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) that was trained using a novel alignment algorithm called [Odds Ratio Preference Optimization (ORPO)](https://huggingface.co/papers/2403.07691) with **7k instances** for **1.3 hours** on 4 nodes of 8 x H100s. ORPO does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO. To train Zephyr-141B-A39B, we used the [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized) preference dataset, which consists of synthetic, high-quality, multi-turn preferences that have been scored via LLMs.
66
+
67
+ > [!NOTE]
68
+ > This model was trained collaboratively between Argilla, KAIST, and Hugging Face
69
+
70
+ ## Model Details
71
+
72
+ ### Model Description
73
+
74
+ <!-- Provide a longer summary of what this model is. -->
75
+
76
+ - **Model type:** A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters. (We initially made a small error in calculating the number of active parameters for the model ID. The model card states the correct number.) Fine-tuned on a mix of publicly available, synthetic datasets.
77
+ - **Language(s) (NLP):** Primarily English.
78
+ - **License:** Apache 2.0
79
+ - **Finetuned from model:** [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)
80
+
81
+ ### Model Sources
82
+
83
+ <!-- Provide the basic links for the model. -->
84
+
85
+ - **Repository:** https://github.com/huggingface/alignment-handbook
86
+ - **Dataset:** https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized
87
+
88
+ ## Performance
89
+
90
+ Zephyr 141B-A39B was trained to test the effectiveness of ORPO at scale and the underlying dataset contains a mix of general chat capabilities. It achieves strong performance on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [IFEval](https://arxiv.org/abs/2311.07911). The scores reported below were obtained using the [LightEval](https://github.com/huggingface/lighteval) evaluation suite and each prompt has been formatted with the model's corresponding chat template to simulate real-world usage. This is why some scores may differ from those reported in technical reports or on the Open LLM Leaderboard.
91
+
92
+ | Model | MT Bench | IFEval | BBH | AGIEval |
93
+ |-----------------------------------------------------------------------------------------------------|---------:|-------:|------:|--------:|
94
+ | [zephyr-orpo-141b-A39b-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1) | 8.17 | 65.06 | 58.96 | 44.16 |
95
+ | [databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct) | 8.26 | 52.13 | 48.50 | 41.16 |
96
+ | [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) | 8.30 | 55.08 | 45.31 | 47.68 |
97
+
98
+
99
+ ## Intended uses & limitations
100
+
101
+ The model was fine-tuned on a blend of chat, code, math, and reasoning data. Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
102
+
103
+ ```python
104
+ # pip install 'transformers>=4.39.3'
105
+ # pip install accelerate
106
+
107
+ import torch
108
+ from transformers import pipeline
109
+
110
+ pipe = pipeline(
111
+ "text-generation",
112
+ model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
113
+ device_map="auto",
114
+ torch_dtype=torch.bfloat16,
115
+ )
116
+ messages = [
117
+ {
118
+ "role": "system",
119
+ "content": "You are Zephyr, a helpful assistant.",
120
+ },
121
+ {"role": "user", "content": "Explain how Mixture of Experts work in language a child would understand."},
122
+ ]
123
+ outputs = pipe(
124
+ messages,
125
+ max_new_tokens=512,
126
+ do_sample=True,
127
+ temperature=0.7,
128
+ top_k=50,
129
+ top_p=0.95,
130
+ )
131
+ print(outputs[0]["generated_text"][-1]["content"])
132
+ ```
133
+
134
+ ## Bias, Risks, and Limitations
135
+
136
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
137
+
138
+ Zephyr 141B-A39B has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
139
+ It is also unknown what the size and composition of the corpus was used to train the base model (`mistral-community/Mixtral-8x22B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
140
+
141
+
142
+ ## Training procedure
143
+
144
+ ### Training hyperparameters
145
+
146
+ The following hyperparameters were used during training:
147
+ - learning_rate: 5e-06
148
+ - train_batch_size: 1
149
+ - eval_batch_size: 8
150
+ - seed: 42
151
+ - distributed_type: multi-GPU
152
+ - num_devices: 32
153
+ - total_train_batch_size: 32
154
+ - total_eval_batch_size: 256
155
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
156
+ - lr_scheduler_type: inverse_sqrt
157
+ - lr_scheduler_warmup_steps: 100
158
+ - num_epochs: 3
159
+
160
+
161
+ ### Framework versions
162
+
163
+ - Transformers 4.39.3
164
+ - Pytorch 2.1.2+cu121
165
+ - Datasets 2.18.0
166
+ - Tokenizers 0.15.1
167
+
168
+ ## Citation
169
+
170
+ If you find Zephyr 141B-A39B is useful in your work, please cite the ORPO paper:
171
+
172
+ ```
173
+ @misc{hong2024orpo,
174
+ title={ORPO: Monolithic Preference Optimization without Reference Model},
175
+ author={Jiwoo Hong and Noah Lee and James Thorne},
176
+ year={2024},
177
+ eprint={2403.07691},
178
+ archivePrefix={arXiv},
179
+ primaryClass={cs.CL}
180
+ }
181
+ ```
182
+
183
+ You may also wish to cite the creators of this model:
184
+
185
+ ```
186
+ @misc{zephyr_141b,
187
+ author = {Alvaro Bartolome and Jiwoo Hong and Noah Lee and Kashif Rasul and Lewis Tunstall},
188
+ title = {Zephyr 141B A39B},
189
+ year = {2024},
190
+ publisher = {Hugging Face},
191
+ journal = {Hugging Face repository},
192
+ howpublished = {\url{https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1}}
193
+ }
194
+ ```
195
+