ydeng9 commited on
Commit
9257b64
1 Parent(s): 437d2f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -7,9 +7,9 @@ language:
7
  base_model: mistralai/Mistral-7B-v0.1
8
  pipeline_tag: text-generation
9
  ---
10
- # Model Card for a quick test model
11
 
12
- This model card aims a quick sanity check and will be updated with more info later.
13
 
14
  ## Model Details
15
 
@@ -18,4 +18,41 @@ This model card aims a quick sanity check and will be updated with more info lat
18
  - Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
19
  - Language(s) (NLP): Primarily English
20
  - License: MIT
21
- - Finetuned from model: mistralai/Mistral-7B-v0.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  base_model: mistralai/Mistral-7B-v0.1
8
  pipeline_tag: text-generation
9
  ---
10
+ # zephyr-7b-sft-full-spin-iter1
11
 
12
+ This model is a self-play fine-tuned model at iteration 1 from [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) using synthetic data based on on the [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.
13
 
14
  ## Model Details
15
 
 
18
  - Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
19
  - Language(s) (NLP): Primarily English
20
  - License: MIT
21
+ - Finetuned from model: alignment-handbook/zephyr-7b-sft-full (based on mistralai/Mistral-7B-v0.1)
22
+
23
+ ### Training hyperparameters
24
+ The following hyperparameters were used during training:
25
+
26
+ learning_rate: 5e-07
27
+ train_batch_size: 8
28
+ seed: 42
29
+ distributed_type: multi-GPU
30
+ num_devices: 8
31
+ total_train_batch_size: 64
32
+ optimizer: RMSProp
33
+ lr_scheduler_type: linear
34
+ lr_scheduler_warmup_ratio: 0.1
35
+ num_epochs: 2.0
36
+
37
+ ## Citation
38
+ ```
39
+ @misc{chen2024selfplay,
40
+ title={Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models},
41
+ author={Zixiang Chen and Yihe Deng and Huizhuo Yuan and Kaixuan Ji and Quanquan Gu},
42
+ year={2024},
43
+ eprint={2401.01335},
44
+ archivePrefix={arXiv},
45
+ primaryClass={cs.LG}
46
+ }
47
+ ```
48
+
49
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
50
+ | Metric | Value |
51
+ |-----------------------|---------------------------|
52
+ | Avg. | 62.86 |
53
+ | ARC (25-shot) | 65.87 |
54
+ | HellaSwag (10-shot) | 85.44 |
55
+ | MMLU (5-shot) | 60.95 |
56
+ | TruthfulQA (0-shot) | 57.39 |
57
+ | Winogrande (5-shot) | 76.64 |
58
+ | GSM8K (5-shot) | 30.86 |