Tijmen2 commited on
Commit
ad6c9ce
1 Parent(s): bf73a56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -136
README.md CHANGED
@@ -55,111 +55,11 @@ textbooks, rather than just on synthetically generated QA pairs. However, it con
55
  _reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
56
  (or any LLM) should not be trusted to be factual.
57
 
58
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
59
- <details><summary>See axolotl config</summary>
60
-
61
- axolotl version: `0.4.0`
62
- ```yaml
63
- base_model: /workspace/output/cosmosage_base/
64
- model_type: MistralForCausalLM
65
- tokenizer_type: LlamaTokenizer
66
- is_mistral_derived_model: true
67
-
68
- load_in_8bit: false
69
- load_in_4bit: false
70
- strict: false
71
-
72
- datasets:
73
- - path: /workspace/input/datasets/qa_tune/arxiv_metadata_qa3.jsonl
74
- type: sharegpt
75
- - path: /workspace/input/datasets/qa_tune/arxiv_refined_qa.jsonl
76
- type: sharegpt
77
- - path: /workspace/input/datasets/qa_tune/arxiv_summary3.jsonl
78
- type: sharegpt
79
- - path: /workspace/input/datasets/qa_tune/cosmology_qa.jsonl
80
- type: alpaca_chat.load_qa
81
- - path: /workspace/input/datasets/qa_tune/openhermes2_5.jsonl
82
- type: sharegpt
83
- - path: /workspace/input/datasets/qa_tune/cosmology_textbooks_qa.jsonl
84
- type: alpaca_chat.load_qa
85
- - path: /workspace/input/datasets/qa_tune/physics_astro_qa.jsonl
86
- type: alpaca_chat.load_qa
87
-
88
- dataset_prepared_path: /workspace/output/qa_tune_prepared
89
- val_set_size: 0.001
90
- output_dir: /workspace/output/cosmosage_qa
91
-
92
- chat_template: inst
93
-
94
- adapter:
95
- lora_model_dir:
96
-
97
- sequence_len: 4096
98
- sample_packing: true
99
- pad_to_sequence_len: true
100
-
101
- lora_r:
102
- lora_alpha:
103
- lora_dropout:
104
- lora_target_modules:
105
- lora_target_linear:
106
- lora_fan_in_fan_out:
107
-
108
- seed: 702
109
-
110
- wandb_project:
111
- wandb_entity:
112
- wandb_watch:
113
- wandb_name:
114
- wandb_log_model:
115
-
116
- gradient_accumulation_steps: 1
117
- micro_batch_size: 4
118
- num_epochs: 2.0
119
- optimizer: adamw_torch
120
- lr_scheduler: linear
121
- learning_rate: 0.000002
122
- max_grad_norm: 3.0
123
-
124
- train_on_inputs: false
125
- group_by_length: false
126
- bf16: true
127
- fp16: false
128
- tf32: false
129
-
130
- gradient_checkpointing: true
131
- early_stopping_patience:
132
- resume_from_checkpoint:
133
- local_rank:
134
- logging_steps: 1
135
- xformers_attention:
136
- flash_attention: true
137
-
138
- warmup_steps: 100
139
- eval_steps: 0.05
140
- eval_table_size:
141
- eval_table_max_new_tokens: 128
142
- saves_per_epoch: 1
143
- save_total_limit: 2
144
- debug:
145
- deepspeed: /workspace/axolotl/deepspeed_configs/zero1.json
146
- weight_decay:
147
- fsdp:
148
- fsdp_config:
149
- special_tokens:
150
- bos_token: "<s>"
151
- eos_token: "</s>"
152
- unk_token: "<unk>"
153
-
154
- ddp_timeout: 7200000
155
-
156
- ```
157
 
158
- </details><br>
159
 
160
- ### Training hyperparameters
161
 
162
- The following hyperparameters were used during training:
163
  - learning_rate: 2e-06
164
  - train_batch_size: 4
165
  - eval_batch_size: 4
@@ -171,37 +71,4 @@ The following hyperparameters were used during training:
171
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
172
  - lr_scheduler_type: linear
173
  - lr_scheduler_warmup_steps: 100
174
- - num_epochs: 2.0
175
-
176
- ### Training results
177
-
178
- | Training Loss | Epoch | Step | Validation Loss |
179
- |:-------------:|:-----:|:-----:|:---------------:|
180
- | 1.1004 | 0.0 | 1 | 1.1450 |
181
- | 0.7343 | 0.1 | 909 | 0.7093 |
182
- | 0.697 | 0.2 | 1818 | 0.6630 |
183
- | 0.6386 | 0.3 | 2727 | 0.6380 |
184
- | 0.5687 | 0.4 | 3636 | 0.6212 |
185
- | 0.5857 | 0.5 | 4545 | 0.6083 |
186
- | 0.6161 | 0.6 | 5454 | 0.5986 |
187
- | 0.522 | 0.7 | 6363 | 0.5894 |
188
- | 0.5563 | 0.8 | 7272 | 0.5825 |
189
- | 0.6176 | 0.9 | 8181 | 0.5766 |
190
- | 0.5948 | 1.0 | 9090 | 0.5719 |
191
- | 0.4269 | 1.08 | 9999 | 0.5817 |
192
- | 0.4858 | 1.18 | 10908 | 0.5796 |
193
- | 0.4909 | 1.28 | 11817 | 0.5765 |
194
- | 0.4325 | 1.38 | 12726 | 0.5746 |
195
- | 0.4037 | 1.48 | 13635 | 0.5720 |
196
- | 0.507 | 1.58 | 14544 | 0.5706 |
197
- | 0.4778 | 1.68 | 15453 | 0.5697 |
198
- | 0.4599 | 1.78 | 16362 | 0.5683 |
199
- | 0.4515 | 1.88 | 17271 | 0.5673 |
200
-
201
-
202
- ### Framework versions
203
-
204
- - Transformers 4.38.0.dev0
205
- - Pytorch 2.0.1+cu118
206
- - Datasets 2.17.0
207
- - Tokenizers 0.15.0
 
55
  _reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
56
  (or any LLM) should not be trusted to be factual.
57
 
58
+ ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
 
60
 
 
61
 
62
+ The following hyperparameters were used during QA tuning:
63
  - learning_rate: 2e-06
64
  - train_batch_size: 4
65
  - eval_batch_size: 4
 
71
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
72
  - lr_scheduler_type: linear
73
  - lr_scheduler_warmup_steps: 100
74
+ - num_epochs: 2.0