notus-7b-v1 / README.md

alvarobartt HF staff

Update README.md

11919de 8 months ago

preview code

raw

history blame

No virus

10.5 kB

	---
	model-index:
	- name: notus-7b-v1
	results: []
	datasets:
	- argilla/ultrafeedback-binarized-avg-rating-for-dpo
	language:
	- en
	base_model: alignment-handbook/zephyr-7b-sft-full
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- dpo
	- preference
	- ultrafeedback
	license: apache-2.0
	---

	# Model Card for Notus 7B v1

	<div align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/LU-vKiC0R7UxxITrwE1F_.png" alt="Image was artificially generated by Dalle-3 via ChatGPT Pro"/>
	</div>

	Notus is going to be a collection of fine-tuned models using DPO, similarly to Zephyr, but mainly focused
	on the Direct Preference Optimization (DPO) step, aiming to incorporate preference feedback into the LLMs
	when fine-tuning those. Notus models are intended to be used as assistants via chat-like applications, and
	are evaluated with the MT-Bench, AlpacaEval, and LM Evaluation Harness benchmarks, to be directly compared
	with Zephyr fine-tuned models also using DPO.

	## Model Details

	### Model Description

	- Developed by: Argilla, Inc. (based on HuggingFace H4 and MistralAI previous efforts and amazing work)
	- Shared by: Argilla, Inc.
	- Model type: GPT-like 7B model DPO fine-tuned
	- Language(s) (NLP): Mainly English
	- License: Apache 2.0 (same as Zephyr 7B SFT and Mistral 7B v0.1)
	- Finetuned from model: [`alignment-handbook/zephyr-7b-sft-full`](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full)

	### Model Sources

	- Repository: https://github.com/argilla-io/notus-7b
	- Paper: N/A
	- Demo: https://argilla-notus-chat-ui.hf.space/

	### Model Date

	Notus 7B v1 was trained along November, 2023. And the data as generated by GPT-4 without the usage of external resources, has a cutoff at September, 2021.

	## Evaluation

	### LM Eval Harness

	We ran the evaluation using [`EleutherAI/lm-eval-harness`](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor) from the `big-refactor` branch, aiming to mimic the [Open LLM Leaderboard by HuggingFace H4](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), but running everything on our VMs instead, as we're still experimenting.

	From a first evaluation on the benchmark, we could see that Notus 7B DPO slightly improved compared to Zephyr 7B Beta/Alpha and Mistral 7B as we see from the average metric of 7 tasks from the leaderboard.

	\| Model \| Average ⬆️ \| ARC (25-s) ⬆️ \| HellaSwag (10-s) ⬆️ \| MMLU (5-s) ⬆️ \| TruthfulQA (MC2) (0-s) ⬇️ \| Winogrande (5-s) ⬇️ \| GSM8K (5-s) ⬆️ \| DROP (3-s) ⬇️ \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) \| 50.32 \| 59.58 \| 83.31 \| 64.16 \| 42.15 \| 78.37 \| 18.12 \| 6.14 \|
	\|[HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) \| 52.4 \| 61.01 \| 84.04 \| 61.39 \| 57.9 \| 78.61 \| 14.03 \| 9.82 \|
	\|[HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) \| 52.15 \| 62.03 \| 84.36 \| 61.07 \| 57.45 \| 77.74 \| 12.74 \| 9.66 \|
	\| Ours \| 54.09 \| 64.25 \| 84.90 \| 61.69 \| 52.77 \| 74.51 \| 39.5 \| 0.98 \|

	Anyway, we will also add our model to the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) queue to be evaluated on Hugging Face's end to ensure that the produced results match the same ones, as we found some inconsistencies for DROP using the `big-refactor` branch on `lm-eval-harness`.

	### MT Bench (Coming soon!)

	### Alpaca Eval (Coming soon!)

	## Training Details

	### Training Hardware

	We used a VM with 8 x A100 40GB hosted in Lambda Labs.

	### Training Data

	We used a slightly curated version of [`openbmb/UltraFeedback`](https://huggingface.co/datasets/openbmb/UltraFeedback), named [`argilla/ultrafeedback-binarized-avg-rating-for-dpo`](https://huggingface.co/argilla/ultrafeedback-binarized-avg-rating-for-dpo).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.5051 \| 0.1 \| 100 \| 0.5180 \| 0.1475 \| -0.3954 \| 0.7183 \| 0.5429 \| -246.6286 \| -297.5412 \| -2.7438 \| -3.0431 \|
	\| 0.4321 \| 0.21 \| 200 \| 0.4375 \| 0.1353 \| -0.9529 \| 0.7540 \| 1.0882 \| -252.2036 \| -297.6632 \| -2.7578 \| -3.0543 \|
	\| 0.3848 \| 0.31 \| 300 \| 0.4301 \| -0.4813 \| -1.8921 \| 0.7302 \| 1.4107 \| -261.5956 \| -303.8301 \| -2.7592 \| -3.0508 \|
	\| 0.3777 \| 0.42 \| 400 \| 0.4091 \| -0.8597 \| -2.5306 \| 0.7698 \| 1.6709 \| -267.9805 \| -307.6138 \| -2.7476 \| -3.0474 \|
	\| 0.3559 \| 0.52 \| 500 \| 0.4332 \| -1.0424 \| -2.6019 \| 0.7619 \| 1.5595 \| -268.6939 \| -309.4406 \| -2.2960 \| -2.6106 \|
	\| 0.4178 \| 0.62 \| 600 \| 0.3934 \| -0.6434 \| -2.4837 \| 0.7659 \| 1.8404 \| -267.5121 \| -305.4503 \| -2.5487 \| -2.8508 \|
	\| 0.4206 \| 0.73 \| 700 \| 0.4058 \| -1.4700 \| -3.5113 \| 0.7857 \| 2.0413 \| -277.7877 \| -313.7168 \| -2.5679 \| -2.8727 \|
	\| 0.4323 \| 0.83 \| 800 \| 0.3929 \| -0.9025 \| -2.6935 \| 0.7897 \| 1.7910 \| -269.6095 \| -308.0414 \| -2.6213 \| -2.9202 \|
	\| 0.3706 \| 0.93 \| 900 \| 0.3903 \| -1.1122 \| -3.0257 \| 0.8056 \| 1.9135 \| -272.9316 \| -310.1388 \| -2.5428 \| -2.8416 \|
	\| 0.0496 \| 1.04 \| 1000 \| 0.3991 \| -1.4248 \| -4.1245 \| 0.8016 \| 2.6997 \| -283.9196 \| -313.2651 \| -2.5093 \| -2.8150 \|
	\| 0.0723 \| 1.14 \| 1100 \| 0.3999 \| -1.8789 \| -4.5317 \| 0.7897 \| 2.6528 \| -287.9914 \| -317.8056 \| -2.5170 \| -2.8242 \|
	\| 0.0481 \| 1.25 \| 1200 \| 0.4191 \| -2.6211 \| -5.5294 \| 0.7817 \| 2.9083 \| -297.9687 \| -325.2281 \| -2.5139 \| -2.8109 \|
	\| 0.0432 \| 1.35 \| 1300 \| 0.4070 \| -2.0605 \| -5.0460 \| 0.8056 \| 2.9855 \| -293.1345 \| -319.6214 \| -2.5153 \| -2.8121 \|
	\| 0.0402 \| 1.45 \| 1400 \| 0.4001 \| -2.2445 \| -5.0942 \| 0.7937 \| 2.8497 \| -293.6164 \| -321.4614 \| -2.4383 \| -2.7388 \|
	\| 0.0529 \| 1.56 \| 1500 \| 0.4066 \| -2.3499 \| -5.2468 \| 0.8016 \| 2.8969 \| -295.1426 \| -322.5153 \| -2.3906 \| -2.6963 \|
	\| 0.0651 \| 1.66 \| 1600 \| 0.3962 \| -2.0597 \| -4.8915 \| 0.8016 \| 2.8318 \| -291.5901 \| -319.6136 \| -2.3390 \| -2.6469 \|
	\| 0.0738 \| 1.77 \| 1700 \| 0.3942 \| -1.8893 \| -4.6107 \| 0.8135 \| 2.7214 \| -288.7817 \| -317.9099 \| -2.3532 \| -2.6607 \|
	\| 0.0597 \| 1.87 \| 1800 \| 0.3990 \| -1.8774 \| -4.7221 \| 0.8175 \| 2.8448 \| -289.8961 \| -317.7905 \| -2.2728 \| -2.5908 \|
	\| 0.0686 \| 1.97 \| 1900 \| 0.3924 \| -1.8745 \| -4.6807 \| 0.8056 \| 2.8062 \| -289.4821 \| -317.7617 \| -2.2554 \| -2.5658 \|
	\| 0.0116 \| 2.08 \| 2000 \| 0.4260 \| -2.4687 \| -5.7190 \| 0.7937 \| 3.2503 \| -299.8647 \| -323.7037 \| -2.2297 \| -2.5347 \|
	\| 0.0114 \| 2.18 \| 2100 \| 0.4519 \| -2.8266 \| -6.3706 \| 0.7976 \| 3.5440 \| -306.3802 \| -327.2823 \| -2.2185 \| -2.5219 \|
	\| 0.0073 \| 2.28 \| 2200 \| 0.4563 \| -2.9422 \| -6.5564 \| 0.8016 \| 3.6142 \| -308.2384 \| -328.4384 \| -2.2103 \| -2.5126 \|
	\| 0.0094 \| 2.39 \| 2300 \| 0.4636 \| -3.3246 \| -7.0542 \| 0.8016 \| 3.7296 \| -313.2165 \| -332.2628 \| -2.2059 \| -2.5081 \|
	\| 0.0056 \| 2.49 \| 2400 \| 0.4745 \| -3.3599 \| -7.1652 \| 0.7976 \| 3.8053 \| -314.3266 \| -332.6161 \| -2.1945 \| -2.4943 \|
	\| 0.0052 \| 2.6 \| 2500 \| 0.4812 \| -3.4916 \| -7.3391 \| 0.7976 \| 3.8475 \| -316.0656 \| -333.9322 \| -2.1888 \| -2.4881 \|
	\| 0.0065 \| 2.7 \| 2600 \| 0.4678 \| -3.2226 \| -6.9887 \| 0.7976 \| 3.7661 \| -312.5613 \| -331.2425 \| -2.1644 \| -2.4560 \|
	\| 0.0059 \| 2.8 \| 2700 \| 0.4694 \| -3.4307 \| -7.2484 \| 0.7976 \| 3.8177 \| -315.1584 \| -333.3234 \| -2.1572 \| -2.4483 \|
	\| 0.0054 \| 2.91 \| 2800 \| 0.4707 \| -3.4959 \| -7.3283 \| 0.8056 \| 3.8324 \| -315.9576 \| -333.9758 \| -2.1575 \| -2.4491 \|

	### Framework versions

	- Transformers 4.35.0
	- Pytorch 2.1.1+cu121
	- Datasets 2.14.6
	- Tokenizers 0.14.1

	### Evaluation during Training

	- Loss: 0.4730
	- Rewards/chosen: -3.5289
	- Rewards/rejected: -7.3700
	- Rewards/accuracies: 0.8016
	- Rewards/margins: 3.8412
	- Logps/rejected: -316.3751
	- Logps/chosen: -334.3053
	- Logits/rejected: -2.1644
	- Logits/chosen: -2.4556