taozi555 commited on
Commit
8fe4ad3
1 Parent(s): 6c6f116

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +168 -0
  2. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ base_model: meta-llama/Meta-Llama-3-8B
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: out
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
+ <details><summary>See axolotl config</summary>
16
+
17
+ axolotl version: `0.4.0`
18
+ ```yaml
19
+ base_model: meta-llama/Meta-Llama-3-8B
20
+ model_type: LlamaForCausalLM
21
+ tokenizer_type: AutoTokenizer
22
+
23
+ load_in_8bit: false
24
+ load_in_4bit: false
25
+ strict: false
26
+
27
+ datasets:
28
+ - path: taozi555/bagel
29
+ type: sharegpt
30
+ # - path: jondurbin/cinematika-v0.1
31
+ # type: text
32
+ - path: MinervaAI/Aesir-Preview
33
+ type: sharegpt
34
+ - path: Norquinal/claude_multiround_chat_30k
35
+ type: sharegpt
36
+ dataset_prepared_path: last_run_prepared
37
+ val_set_size: 0.05
38
+ output_dir: ./out
39
+ chat_template: alpaca
40
+
41
+ sequence_len: 8192
42
+ sample_packing: true
43
+ pad_to_sequence_len: true
44
+
45
+ wandb_project: waifu-8b
46
+ wandb_entity:
47
+ wandb_watch:
48
+ wandb_name:
49
+ wandb_log_model:
50
+
51
+ gradient_accumulation_steps: 4
52
+ micro_batch_size: 2
53
+ num_epochs: 2
54
+ optimizer: paged_adamw_8bit
55
+ lr_scheduler: cosine
56
+ learning_rate: 2e-5
57
+
58
+ train_on_inputs: false
59
+ group_by_length: false
60
+ bf16: auto
61
+ fp16:
62
+ tf32: false
63
+
64
+ gradient_checkpointing: true
65
+ gradient_checkpointing_kwargs:
66
+ use_reentrant: false
67
+ early_stopping_patience:
68
+ resume_from_checkpoint:
69
+ logging_steps: 1
70
+ xformers_attention:
71
+ flash_attention: true
72
+
73
+ warmup_steps: 100
74
+ eval_steps: 100
75
+ eval_table_size:
76
+ saves_per_epoch:
77
+ save_steps: 100
78
+ save_total_limit: 20
79
+ debug:
80
+ deepspeed: /workspace/deepspeed.json
81
+ weight_decay: 0.0
82
+ fsdp:
83
+ fsdp_config:
84
+ special_tokens:
85
+ pad_token: <|end_of_text|>
86
+
87
+ ```
88
+
89
+ </details><br>
90
+
91
+ # out
92
+
93
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
94
+ It achieves the following results on the evaluation set:
95
+ - Loss: 0.7773
96
+
97
+ ## Model description
98
+
99
+ More information needed
100
+
101
+ ## Intended uses & limitations
102
+
103
+ More information needed
104
+
105
+ ## Training and evaluation data
106
+
107
+ More information needed
108
+
109
+ ## Training procedure
110
+
111
+ ### Training hyperparameters
112
+
113
+ The following hyperparameters were used during training:
114
+ - learning_rate: 2e-05
115
+ - train_batch_size: 2
116
+ - eval_batch_size: 2
117
+ - seed: 42
118
+ - distributed_type: multi-GPU
119
+ - num_devices: 4
120
+ - gradient_accumulation_steps: 4
121
+ - total_train_batch_size: 32
122
+ - total_eval_batch_size: 8
123
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
124
+ - lr_scheduler_type: cosine
125
+ - lr_scheduler_warmup_steps: 100
126
+ - num_epochs: 2
127
+
128
+ ### Training results
129
+
130
+ | Training Loss | Epoch | Step | Validation Loss |
131
+ |:-------------:|:-----:|:----:|:---------------:|
132
+ | 1.0419 | 0.0 | 1 | 1.1113 |
133
+ | 0.9179 | 0.07 | 100 | 0.8886 |
134
+ | 1.0123 | 0.14 | 200 | 0.8822 |
135
+ | 0.9106 | 0.21 | 300 | 0.8701 |
136
+ | 0.8992 | 0.28 | 400 | 0.8637 |
137
+ | 0.7915 | 0.35 | 500 | 0.8527 |
138
+ | 0.9123 | 0.42 | 600 | 0.8448 |
139
+ | 0.7849 | 0.49 | 700 | 0.8381 |
140
+ | 0.8381 | 0.56 | 800 | 0.8344 |
141
+ | 0.7652 | 0.63 | 900 | 0.8230 |
142
+ | 0.9006 | 0.7 | 1000 | 0.8167 |
143
+ | 0.8589 | 0.77 | 1100 | 0.8088 |
144
+ | 0.7635 | 0.84 | 1200 | 0.8016 |
145
+ | 0.7696 | 0.91 | 1300 | 0.7951 |
146
+ | 0.8476 | 0.98 | 1400 | 0.7879 |
147
+ | 0.6031 | 1.03 | 1500 | 0.8063 |
148
+ | 0.5386 | 1.09 | 1600 | 0.8065 |
149
+ | 0.5298 | 1.16 | 1700 | 0.8015 |
150
+ | 0.5736 | 1.23 | 1800 | 0.7979 |
151
+ | 0.5761 | 1.3 | 1900 | 0.7939 |
152
+ | 0.5576 | 1.37 | 2000 | 0.7917 |
153
+ | 0.4814 | 1.44 | 2100 | 0.7879 |
154
+ | 0.5146 | 1.51 | 2200 | 0.7842 |
155
+ | 0.4577 | 1.58 | 2300 | 0.7832 |
156
+ | 0.4821 | 1.65 | 2400 | 0.7806 |
157
+ | 0.6088 | 1.72 | 2500 | 0.7782 |
158
+ | 0.5113 | 1.79 | 2600 | 0.7785 |
159
+ | 0.5861 | 1.86 | 2700 | 0.7779 |
160
+ | 0.4885 | 1.93 | 2800 | 0.7773 |
161
+
162
+
163
+ ### Framework versions
164
+
165
+ - Transformers 4.40.0.dev0
166
+ - Pytorch 2.2.0+cu121
167
+ - Datasets 2.15.0
168
+ - Tokenizers 0.15.0
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:612c7706e9b94e1e3ab3c67def837ae331899a6d5979d8302aaf3a9944bad4f3
3
+ size 16060563132