RichardErkhov
/

HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

RichardErkhov commited on May 3

Commit

f359f04

•

1 Parent(s): 33505b8

uploaded readme

Browse files

Files changed (1) hide show

README.md +195 -0

README.md ADDED Viewed

	@@ -0,0 +1,195 @@

+Quantization made by Richard Erkhov.
+[Github](https://github.com/RichardErkhov)
+[Discord](https://discord.gg/pvy7H8DZMG)
+[Request more models](https://github.com/RichardErkhov/quant_request)
+zephyr-orpo-141b-A35b-v0.1 - GGUF
+- Model creator: https://huggingface.co/HuggingFaceH4/
+- Original model: https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/
+| Name | Quant method | Size |
+| ---- | ---- | ---- |
+| [zephyr-orpo-141b-A35b-v0.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q2_K | 48.52GB |
+| [zephyr-orpo-141b-A35b-v0.1.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_XS | 54.23GB |
+| [zephyr-orpo-141b-A35b-v0.1.IQ3_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_S | 57.27GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_S | 57.27GB |
+| [zephyr-orpo-141b-A35b-v0.1.IQ3_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ3_M | 60.06GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K | 63.13GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_M | 63.13GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q3_K_L | 67.6GB |
+| [zephyr-orpo-141b-A35b-v0.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ4_XS | 71.11GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_0 | 74.05GB |
+| [zephyr-orpo-141b-A35b-v0.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | IQ4_NL | 74.95GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K_S | 74.95GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K | 79.71GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_K_M | 79.71GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q4_1 | 82.18GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_0 | 90.31GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K_S | 90.31GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K | 93.1GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_K_M | 93.1GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q5_1 | 98.45GB |
+| [zephyr-orpo-141b-A35b-v0.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/HuggingFaceH4_-_zephyr-orpo-141b-A35b-v0.1-gguf/tree/main/) | Q6_K | 107.6GB |
+Original model description:
+---
+license: apache-2.0
+base_model: mistral-community/Mixtral-8x22B-v0.1
+tags:
+- trl
+- orpo
+- generated_from_trainer
+datasets:
+- argilla/distilabel-capybara-dpo-7k-binarized
+model-index:
+- name: zephyr-orpo-141b-A35b-v0.1
+  results: []
+inference:
+  parameters:
+    temperature: 0.7
+---
+<img src="https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/resolve/main/logo.png" alt="Zephyr 141B Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# Model Card for Zephyr 141B-A39B
+Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 141B-A39B is the latest model in the series, and is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) that was trained using a novel alignment algorithm called [Odds Ratio Preference Optimization (ORPO)](https://huggingface.co/papers/2403.07691) with **7k instances** for **1.3 hours** on 4 nodes of 8 x H100s. ORPO does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO. To train Zephyr-141B-A39B, we used the [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized) preference dataset, which consists of synthetic, high-quality, multi-turn preferences that have been scored via LLMs.
+> [!NOTE]
+> This model was trained collaboratively between Argilla, KAIST, and Hugging Face
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Model type:** A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters. (We initially made a small error in calculating the number of active parameters for the model ID. The model card states the correct number.) Fine-tuned on a mix of publicly available, synthetic datasets.
+- **Language(s) (NLP):** Primarily English.
+- **License:** Apache 2.0
+- **Finetuned from model:** [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)
+### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Repository:** https://github.com/huggingface/alignment-handbook
+- **Dataset:** https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized
+## Performance
+Zephyr 141B-A39B was trained to test the effectiveness of ORPO at scale and the underlying dataset contains a mix of general chat capabilities. It achieves strong performance on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [IFEval](https://arxiv.org/abs/2311.07911). The scores reported below were obtained using the [LightEval](https://github.com/huggingface/lighteval) evaluation suite and each prompt has been formatted with the model's corresponding chat template to simulate real-world usage. This is why some scores may differ from those reported in technical reports or on the Open LLM Leaderboard.
+| Model                                                                                               | MT Bench | IFEval |   BBH | AGIEval |
+|-----------------------------------------------------------------------------------------------------|---------:|-------:|------:|--------:|
+| [zephyr-orpo-141b-A39b-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1)       |     8.17 |  65.06 | 58.96 |   44.16 |
+| [databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct)                         |     8.26 |  52.13 | 48.50 |   41.16 |
+| [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) |     8.30 |  55.08 | 45.31 |   47.68 |
+## Intended uses & limitations
+The model was fine-tuned on a blend of chat, code, math, and reasoning data. Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
+```python
+# pip install 'transformers>=4.39.3'
+# pip install accelerate
+import torch
+from transformers import pipeline
+pipe = pipeline(
+    "text-generation",
+    model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
+    device_map="auto",
+    torch_dtype=torch.bfloat16,
+)
+messages = [
+    {
+        "role": "system",
+        "content": "You are Zephyr, a helpful assistant.",
+    },
+    {"role": "user", "content": "Explain how Mixture of Experts work in language a child would understand."},
+]
+outputs = pipe(
+    messages,
+    max_new_tokens=512,
+    do_sample=True,
+    temperature=0.7,
+    top_k=50,
+    top_p=0.95,
+)
+print(outputs[0]["generated_text"][-1]["content"])
+```
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+Zephyr 141B-A39B has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
+It is also unknown what the size and composition of the corpus was used to train the base model (`mistral-community/Mixtral-8x22B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 1
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 32
+- total_train_batch_size: 32
+- total_eval_batch_size: 256
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: inverse_sqrt
+- lr_scheduler_warmup_steps: 100
+- num_epochs: 3
+### Framework versions
+- Transformers 4.39.3
+- Pytorch 2.1.2+cu121
+- Datasets 2.18.0
+- Tokenizers 0.15.1
+## Citation
+If you find Zephyr 141B-A39B is useful in your work, please cite the ORPO paper:
+```
+@misc{hong2024orpo,
+      title={ORPO: Monolithic Preference Optimization without Reference Model},
+      author={Jiwoo Hong and Noah Lee and James Thorne},
+      year={2024},
+      eprint={2403.07691},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+You may also wish to cite the creators of this model:
+```
+@misc{zephyr_141b,
+  author = {Alvaro Bartolome and Jiwoo Hong and Noah Lee and Kashif Rasul and Lewis Tunstall},
+  title = {Zephyr 141B A39B},
+  year = {2024},
+  publisher = {Hugging Face},
+  journal = {Hugging Face repository},
+  howpublished = {\url{https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1}}
+}
+```