RichardErkhov commited on
Commit
2ec2dd3
1 Parent(s): a1aa800

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ zephyr-7b-sft-full-SPIN-iter2 - bnb 4bits
11
+ - Model creator: https://huggingface.co/UCLA-AGI/
12
+ - Original model: https://huggingface.co/UCLA-AGI/zephyr-7b-sft-full-SPIN-iter2/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: mit
20
+ datasets:
21
+ - UCLA-AGI/SPIN_iter2
22
+ language:
23
+ - en
24
+ pipeline_tag: text-generation
25
+ ---
26
+ Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models (https://arxiv.org/abs/2401.01335)
27
+
28
+ # zephyr-7b-sft-full-spin-iter2
29
+
30
+ This model is a self-play fine-tuned model at iteration 2 from [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) using synthetic data based on on the [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.
31
+
32
+ ## Model Details
33
+
34
+ ### Model Description
35
+
36
+ - Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
37
+ - Language(s) (NLP): Primarily English
38
+ - License: MIT
39
+ - Finetuned from model: alignment-handbook/zephyr-7b-sft-full (based on mistralai/Mistral-7B-v0.1)
40
+
41
+ ### Training hyperparameters
42
+ The following hyperparameters were used during training:
43
+
44
+ - learning_rate: 1e-07
45
+ - train_batch_size: 8
46
+ - seed: 42
47
+ - distributed_type: multi-GPU
48
+ - num_devices: 8
49
+ - total_train_batch_size: 64
50
+ - optimizer: RMSProp
51
+ - lr_scheduler_type: linear
52
+ - lr_scheduler_warmup_ratio: 0.1
53
+ - num_epochs: 2.0
54
+
55
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
56
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_UCLA-AGI__test-test)
57
+ | Metric | Value |
58
+ |-----------------------|---------------------------|
59
+ | Avg. | 63.54 |
60
+ | ARC (25-shot) | 66.47 |
61
+ | HellaSwag (10-shot) | 85.82 |
62
+ | MMLU (5-shot) | 61.48 |
63
+ | TruthfulQA (0-shot) | 57.75 |
64
+ | Winogrande (5-shot) | 76.95 |
65
+ | GSM8K (5-shot) | 32.75 |
66
+
67
+ ## Citation
68
+ ```
69
+ @misc{chen2024selfplay,
70
+ title={Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models},
71
+ author={Zixiang Chen and Yihe Deng and Huizhuo Yuan and Kaixuan Ji and Quanquan Gu},
72
+ year={2024},
73
+ eprint={2401.01335},
74
+ archivePrefix={arXiv},
75
+ primaryClass={cs.LG}
76
+ }
77
+ ```
78
+