DewEfresh commited on
Commit
be83584
1 Parent(s): 18cc6a0

End of training

Browse files
Files changed (2) hide show
  1. README.md +141 -0
  2. adapter_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: DewEfresh/neo_7b-slerp
3
+ library_name: peft
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: neo-7b-slerp-hermes2
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.4.1`
19
+ ```yaml
20
+ adapter: qlora
21
+ base_model: DewEfresh/neo_7b-slerp
22
+ bf16: auto
23
+ chat_template: chatml
24
+ dataset_prepared_path: null
25
+ datasets:
26
+ - conversation: chatml
27
+ path: https://huggingface.co/datasets/cognitivecomputations/dolphin-2.9.3/resolve/main/openhermes200k_unfiltered.jsonl
28
+ type: sharegpt
29
+ debug: null
30
+ deepspeed: null
31
+ early_stopping_patience: null
32
+ eval_sample_packing: false
33
+ evals_per_epoch: 3
34
+ flash_attention: true
35
+ fp16: null
36
+ fsdp: null
37
+ fsdp_config: null
38
+ gradient_accumulation_steps: 16
39
+ gradient_checkpointing: true
40
+ group_by_length: false
41
+ hub_model_id: DewEfresh/neo-7b-slerp-hermes2
42
+ learning_rate: 0.0002
43
+ load_in_4bit: true
44
+ load_in_8bit: false
45
+ local_rank: null
46
+ logging_steps: 1
47
+ lora_alpha: 16
48
+ lora_dropout: 0.05
49
+ lora_fan_in_fan_out: null
50
+ lora_model_dir: null
51
+ lora_r: 32
52
+ lora_target_linear: true
53
+ lora_target_modules: null
54
+ lr_scheduler: cosine
55
+ micro_batch_size: 8
56
+ num_epochs: 3
57
+ optimizer: adamw_bnb_8bit
58
+ output_dir: ./outputs/qlora-out
59
+ pad_to_sequence_len: true
60
+ resume_from_checkpoint: null
61
+ sample_packing: true
62
+ saves_per_epoch: 1
63
+ sequence_len: 4096
64
+ special_tokens: null
65
+ strict: false
66
+ tf32: false
67
+ train_on_inputs: false
68
+ trust_remote_code: true
69
+ val_set_size: 0.05
70
+ wandb_entity: null
71
+ wandb_log_model: null
72
+ wandb_name: null
73
+ wandb_project: neo-7b-slerp-hermes2
74
+ wandb_watch: null
75
+ warmup_steps: 10
76
+ weight_decay: 0.0
77
+ xformers_attention: null
78
+
79
+ ```
80
+
81
+ </details><br>
82
+
83
+ # neo-7b-slerp-hermes2
84
+
85
+ This model is a fine-tuned version of [DewEfresh/neo_7b-slerp](https://huggingface.co/DewEfresh/neo_7b-slerp) on the None dataset.
86
+ It achieves the following results on the evaluation set:
87
+ - Loss: 5.5603
88
+
89
+ ## Model description
90
+
91
+ More information needed
92
+
93
+ ## Intended uses & limitations
94
+
95
+ More information needed
96
+
97
+ ## Training and evaluation data
98
+
99
+ More information needed
100
+
101
+ ## Training procedure
102
+
103
+ ### Training hyperparameters
104
+
105
+ The following hyperparameters were used during training:
106
+ - learning_rate: 0.0002
107
+ - train_batch_size: 8
108
+ - eval_batch_size: 8
109
+ - seed: 42
110
+ - distributed_type: multi-GPU
111
+ - num_devices: 8
112
+ - gradient_accumulation_steps: 16
113
+ - total_train_batch_size: 1024
114
+ - total_eval_batch_size: 64
115
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
116
+ - lr_scheduler_type: cosine
117
+ - lr_scheduler_warmup_steps: 10
118
+ - num_epochs: 3
119
+
120
+ ### Training results
121
+
122
+ | Training Loss | Epoch | Step | Validation Loss |
123
+ |:-------------:|:------:|:----:|:---------------:|
124
+ | 15.1348 | 0.0521 | 1 | 14.5242 |
125
+ | 9.6676 | 0.3648 | 7 | 8.0894 |
126
+ | 7.076 | 0.7296 | 14 | 6.8289 |
127
+ | 6.6836 | 1.0717 | 21 | 6.4673 |
128
+ | 6.3156 | 1.4365 | 28 | 6.1033 |
129
+ | 6.0471 | 1.8013 | 35 | 5.8471 |
130
+ | 5.834 | 2.1433 | 42 | 5.6670 |
131
+ | 5.7349 | 2.5081 | 49 | 5.5762 |
132
+ | 5.7014 | 2.8730 | 56 | 5.5603 |
133
+
134
+
135
+ ### Framework versions
136
+
137
+ - PEFT 0.11.1
138
+ - Transformers 4.41.1
139
+ - Pytorch 2.1.2+cu118
140
+ - Datasets 2.19.1
141
+ - Tokenizers 0.19.1
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e57659c48beb38adef5a3d117c4b6565617b3c1c76df9f44b06c1aad8c95697
3
+ size 96408730