--- model-index: - name: notus-7b-dpo-lora results: [] datasets: - argilla/ultrafeedback-binarized-avg-rating-for-dpo language: - en base_model: alignment-handbook/zephyr-7b-sft-full library_name: transformers pipeline_tag: text-generation tags: - dpo - preference - ultrafeedback license: apache-2.0 --- # Model Card for Notus 7B Notus is going to be a collection of fine-tuned models using DPO, similarly to Zephyr, but mainly focused on the Direct Preference Optimization (DPO) step, aiming to incorporate preference feedback into the LLMs when fine-tuning those. Notus models are intended to be used as assistants via chat-like applications, and are evaluated with the MT-Bench and AlpacaEval benchmarks, to be directly compared with Zephyr fine-tuned models also using DPO. ## Model Details # notus-7b-dpo ### Model Description - **Developed by:** Argilla, Inc. (based on HuggingFace H4 and MistralAI previous efforts and amazing work) - **Shared by:** Argilla, Inc. - **Model type:** GPT-like 7B model DPO fine-tuned - **Language(s) (NLP):** Mainly English - **License:** Apache 2.0 (same as Zephyr 7B SFT and Mistral 7B v0.1) - **Finetuned from model:** [`alignment-handbook/zephyr-7b-sft-full`](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) ### Model Sources [optional] - **Repository:** https://github.com/argilla-io/notus-7b-dpo - **Paper:** N/A - **Demo:** https://argilla-notus-chat-ui.hf.space/ ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 64 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.5051 | 0.1 | 100 | 0.5180 | 0.1475 | -0.3954 | 0.7183 | 0.5429 | -246.6286 | -297.5412 | -2.7438 | -3.0431 | | 0.4321 | 0.21 | 200 | 0.4375 | 0.1353 | -0.9529 | 0.7540 | 1.0882 | -252.2036 | -297.6632 | -2.7578 | -3.0543 | | 0.3848 | 0.31 | 300 | 0.4301 | -0.4813 | -1.8921 | 0.7302 | 1.4107 | -261.5956 | -303.8301 | -2.7592 | -3.0508 | | 0.3777 | 0.42 | 400 | 0.4091 | -0.8597 | -2.5306 | 0.7698 | 1.6709 | -267.9805 | -307.6138 | -2.7476 | -3.0474 | | 0.3559 | 0.52 | 500 | 0.4332 | -1.0424 | -2.6019 | 0.7619 | 1.5595 | -268.6939 | -309.4406 | -2.2960 | -2.6106 | | 0.4178 | 0.62 | 600 | 0.3934 | -0.6434 | -2.4837 | 0.7659 | 1.8404 | -267.5121 | -305.4503 | -2.5487 | -2.8508 | | 0.4206 | 0.73 | 700 | 0.4058 | -1.4700 | -3.5113 | 0.7857 | 2.0413 | -277.7877 | -313.7168 | -2.5679 | -2.8727 | | 0.4323 | 0.83 | 800 | 0.3929 | -0.9025 | -2.6935 | 0.7897 | 1.7910 | -269.6095 | -308.0414 | -2.6213 | -2.9202 | | 0.3706 | 0.93 | 900 | 0.3903 | -1.1122 | -3.0257 | 0.8056 | 1.9135 | -272.9316 | -310.1388 | -2.5428 | -2.8416 | | 0.0496 | 1.04 | 1000 | 0.3991 | -1.4248 | -4.1245 | 0.8016 | 2.6997 | -283.9196 | -313.2651 | -2.5093 | -2.8150 | | 0.0723 | 1.14 | 1100 | 0.3999 | -1.8789 | -4.5317 | 0.7897 | 2.6528 | -287.9914 | -317.8056 | -2.5170 | -2.8242 | | 0.0481 | 1.25 | 1200 | 0.4191 | -2.6211 | -5.5294 | 0.7817 | 2.9083 | -297.9687 | -325.2281 | -2.5139 | -2.8109 | | 0.0432 | 1.35 | 1300 | 0.4070 | -2.0605 | -5.0460 | 0.8056 | 2.9855 | -293.1345 | -319.6214 | -2.5153 | -2.8121 | | 0.0402 | 1.45 | 1400 | 0.4001 | -2.2445 | -5.0942 | 0.7937 | 2.8497 | -293.6164 | -321.4614 | -2.4383 | -2.7388 | | 0.0529 | 1.56 | 1500 | 0.4066 | -2.3499 | -5.2468 | 0.8016 | 2.8969 | -295.1426 | -322.5153 | -2.3906 | -2.6963 | | 0.0651 | 1.66 | 1600 | 0.3962 | -2.0597 | -4.8915 | 0.8016 | 2.8318 | -291.5901 | -319.6136 | -2.3390 | -2.6469 | | 0.0738 | 1.77 | 1700 | 0.3942 | -1.8893 | -4.6107 | 0.8135 | 2.7214 | -288.7817 | -317.9099 | -2.3532 | -2.6607 | | 0.0597 | 1.87 | 1800 | 0.3990 | -1.8774 | -4.7221 | 0.8175 | 2.8448 | -289.8961 | -317.7905 | -2.2728 | -2.5908 | | 0.0686 | 1.97 | 1900 | 0.3924 | -1.8745 | -4.6807 | 0.8056 | 2.8062 | -289.4821 | -317.7617 | -2.2554 | -2.5658 | | 0.0116 | 2.08 | 2000 | 0.4260 | -2.4687 | -5.7190 | 0.7937 | 3.2503 | -299.8647 | -323.7037 | -2.2297 | -2.5347 | | 0.0114 | 2.18 | 2100 | 0.4519 | -2.8266 | -6.3706 | 0.7976 | 3.5440 | -306.3802 | -327.2823 | -2.2185 | -2.5219 | | 0.0073 | 2.28 | 2200 | 0.4563 | -2.9422 | -6.5564 | 0.8016 | 3.6142 | -308.2384 | -328.4384 | -2.2103 | -2.5126 | | 0.0094 | 2.39 | 2300 | 0.4636 | -3.3246 | -7.0542 | 0.8016 | 3.7296 | -313.2165 | -332.2628 | -2.2059 | -2.5081 | | 0.0056 | 2.49 | 2400 | 0.4745 | -3.3599 | -7.1652 | 0.7976 | 3.8053 | -314.3266 | -332.6161 | -2.1945 | -2.4943 | | 0.0052 | 2.6 | 2500 | 0.4812 | -3.4916 | -7.3391 | 0.7976 | 3.8475 | -316.0656 | -333.9322 | -2.1888 | -2.4881 | | 0.0065 | 2.7 | 2600 | 0.4678 | -3.2226 | -6.9887 | 0.7976 | 3.7661 | -312.5613 | -331.2425 | -2.1644 | -2.4560 | | 0.0059 | 2.8 | 2700 | 0.4694 | -3.4307 | -7.2484 | 0.7976 | 3.8177 | -315.1584 | -333.3234 | -2.1572 | -2.4483 | | 0.0054 | 2.91 | 2800 | 0.4707 | -3.4959 | -7.3283 | 0.8056 | 3.8324 | -315.9576 | -333.9758 | -2.1575 | -2.4491 | ### Framework versions - Transformers 4.35.0 - Pytorch 2.1.1+cu121 - Datasets 2.14.6 - Tokenizers 0.14.1 ## Evaluation - Loss: 0.4730 - Rewards/chosen: -3.5289 - Rewards/rejected: -7.3700 - Rewards/accuracies: 0.8016 - Rewards/margins: 3.8412 - Logps/rejected: -316.3751 - Logps/chosen: -334.3053 - Logits/rejected: -2.1644 - Logits/chosen: -2.4556 ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Technical Specifications ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware 8 x A100 40GB #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]