Weyaxi commited on
Commit
70778f8
1 Parent(s): a704d6e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +186 -0
README.md ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ tags:
4
+ - axolotl
5
+ - generated_from_trainer
6
+ base_model: chargoddard/internlm2-20b-llama
7
+ model-index:
8
+ - name: Stellaris-internlm2-20b-r512
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.3.0`
19
+ ```yaml
20
+ base_model: chargoddard/internlm2-20b-llama
21
+ model_type: LlamaForCausalLM
22
+ tokenizer_type: LlamaTokenizer
23
+ is_llama_derived_model: true
24
+
25
+ load_in_8bit: true
26
+ load_in_4bit: false
27
+ strict: false
28
+
29
+ datasets:
30
+ - path: ARB/arb_law.json
31
+ ds_type: json
32
+ type: alpaca
33
+ conversation: chatml
34
+
35
+ - path: ARB/arb_math.json
36
+ ds_type: json
37
+ type: alpaca
38
+ conversation: chatml
39
+
40
+ - path: ARB/arb_mcat_reading.json
41
+ ds_type: json
42
+ type: alpaca
43
+ conversation: chatml
44
+
45
+ - path: ARB/arb_mcat_science.json
46
+ ds_type: json
47
+ type: alpaca
48
+ conversation: chatml
49
+
50
+ - path: ARB/arb_physics.json
51
+ ds_type: json
52
+ type: alpaca
53
+ conversation: chatml
54
+
55
+
56
+ dataset_prepared_path: last_run_prepared
57
+ val_set_size: 0
58
+ output_dir: ./Weyaxi-test
59
+
60
+ sequence_len: 4096
61
+ sample_packing: true
62
+ pad_to_sequence_len: true
63
+
64
+ adapter: lora
65
+ lora_model_dir:
66
+
67
+ lora_r: 512
68
+ lora_alpha: 256
69
+ lora_dropout: 0.05
70
+ lora_target_linear: true
71
+ lora_fan_in_fan_out:
72
+ lora_target_modules:
73
+ - gate_proj
74
+ - down_proj
75
+ - up_proj
76
+ - q_proj
77
+ - v_proj
78
+ - k_proj
79
+ - o_proj
80
+ lora_modules_to_save:
81
+ - embed_tokens
82
+ - lm_head
83
+
84
+ wandb_project: huggingface
85
+ wandb_entity:
86
+ wandb_watch:
87
+ wandb_run_id:
88
+ wandb_log_model:
89
+
90
+ hub_model_id: Weyaxi/Weyaxi-test
91
+
92
+ gradient_accumulation_steps: 4 # change
93
+ micro_batch_size: 2 # change
94
+ num_epochs: 3
95
+ optimizer: adamw_bnb_8bit
96
+ lr_scheduler: cosine
97
+ learning_rate: 0.0002
98
+
99
+ train_on_inputs: false
100
+ group_by_length: false
101
+ bf16: true
102
+ fp16: false
103
+ tf32: false
104
+
105
+ gradient_checkpointing: true
106
+ early_stopping_patience:
107
+ resume_from_checkpoint:
108
+ local_rank:
109
+ logging_steps: 1
110
+ xformers_attention:
111
+ flash_attention: true
112
+
113
+ warmup_steps: 10
114
+
115
+ save_steps: 20
116
+ save_total_limit: 5
117
+
118
+ debug:
119
+ #deepspeed: deepspeed/zero3_bf16.json
120
+ weight_decay: 0.1
121
+ fsdp:
122
+ fsdp_config:
123
+ special_tokens:
124
+ eos_token: "<|im_end|>"
125
+ tokens:
126
+ - "<|im_start|>"
127
+ ```
128
+
129
+ </details><br>
130
+
131
+ # Weyaxi-test
132
+
133
+ This model is a fine-tuned version of [chargoddard/internlm2-20b-llama](https://huggingface.co/chargoddard/internlm2-20b-llama) on the None dataset.
134
+
135
+ ## Model description
136
+
137
+ More information needed
138
+
139
+ ## Intended uses & limitations
140
+
141
+ More information needed
142
+
143
+ ## Training and evaluation data
144
+
145
+ More information needed
146
+
147
+ ## Training procedure
148
+
149
+
150
+ The following `bitsandbytes` quantization config was used during training:
151
+ - quant_method: bitsandbytes
152
+ - load_in_8bit: True
153
+ - load_in_4bit: False
154
+ - llm_int8_threshold: 6.0
155
+ - llm_int8_skip_modules: None
156
+ - llm_int8_enable_fp32_cpu_offload: False
157
+ - llm_int8_has_fp16_weight: False
158
+ - bnb_4bit_quant_type: fp4
159
+ - bnb_4bit_use_double_quant: False
160
+ - bnb_4bit_compute_dtype: float32
161
+
162
+ ### Training hyperparameters
163
+
164
+ The following hyperparameters were used during training:
165
+ - learning_rate: 0.0002
166
+ - train_batch_size: 2
167
+ - eval_batch_size: 2
168
+ - seed: 42
169
+ - gradient_accumulation_steps: 4
170
+ - total_train_batch_size: 8
171
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
172
+ - lr_scheduler_type: cosine
173
+ - lr_scheduler_warmup_steps: 10
174
+ - num_epochs: 3
175
+
176
+ ### Training results
177
+
178
+
179
+
180
+ ### Framework versions
181
+
182
+ - PEFT 0.7.0
183
+ - Transformers 4.37.0.dev0
184
+ - Pytorch 2.0.1+cu118
185
+ - Datasets 2.16.1
186
+ - Tokenizers 0.15.0