isaacchung commited on
Commit
a38f5af
1 Parent(s): d0c79e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -11
README.md CHANGED
@@ -99,7 +99,7 @@ https://huggingface.co/datasets/isaacchung/hotpotqa-dev-raft-subset
99
 
100
  #### Training Hyperparameters
101
 
102
- <!-- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> -->
103
 
104
  Model loaded:
105
  ```python
@@ -160,11 +160,27 @@ trainer = SFTTrainer(
160
  )
161
  ```
162
 
163
- <!-- #### Speeds, Sizes, Times [optional] -->
164
 
165
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
166
 
167
- <!-- [More Information Needed] -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
 
169
  <!-- ## Evaluation -->
170
 
@@ -207,29 +223,29 @@ trainer = SFTTrainer(
207
  <!-- ## Environmental Impact -->
208
 
209
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
210
- <!--
211
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
212
 
213
  - **Hardware Type:** [More Information Needed]
214
  - **Hours used:** [More Information Needed]
215
  - **Cloud Provider:** [More Information Needed]
216
  - **Compute Region:** [More Information Needed]
217
- - **Carbon Emitted:** [More Information Needed]
218
 
219
  ## Technical Specifications [optional]
220
 
221
- ### Model Architecture and Objective
222
 
223
- [More Information Needed]
224
 
225
  ### Compute Infrastructure
226
 
227
- [More Information Needed]
228
 
229
  #### Hardware
230
 
231
- [More Information Needed]
232
- -->
233
  <!-- #### Software
234
 
235
  [More Information Needed]
 
99
 
100
  #### Training Hyperparameters
101
 
102
+ <!-- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
 
104
  Model loaded:
105
  ```python
 
160
  )
161
  ```
162
 
163
+ #### Speeds, Sizes, Times [optional]
164
 
165
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
166
 
167
+ - train_runtime: 1148.4436
168
+ - train_samples_per_second: 0.392
169
+ - train_steps_per_second: 0.065
170
+ - train_loss: 0.5639963404337565
171
+ - epoch: 3.0
172
+
173
+ #### Training Loss
174
+
175
+ ```
176
+ {'loss': 1.0092, 'grad_norm': 0.27965569496154785, 'learning_rate': 0.0002, 'epoch': 0.4}
177
+ {'loss': 0.695, 'grad_norm': 0.17789314687252045, 'learning_rate': 0.0002, 'epoch': 0.8}
178
+ {'loss': 0.6747, 'grad_norm': 0.13655725121498108, 'learning_rate': 0.0002, 'epoch': 1.2}
179
+ {'loss': 0.508, 'grad_norm': 0.14653471112251282, 'learning_rate': 0.0002, 'epoch': 1.6}
180
+ {'loss': 0.4961, 'grad_norm': 0.14873674511909485, 'learning_rate': 0.0002, 'epoch': 2.0}
181
+ {'loss': 0.3509, 'grad_norm': 0.1657964587211609, 'learning_rate': 0.0002, 'epoch': 2.4}
182
+ {'loss': 0.3321, 'grad_norm': 0.1634644716978073, 'learning_rate': 0.0002, 'epoch': 2.8}
183
+ ```
184
 
185
  <!-- ## Evaluation -->
186
 
 
223
  <!-- ## Environmental Impact -->
224
 
225
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
226
+
227
+ <!-- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
228
 
229
  - **Hardware Type:** [More Information Needed]
230
  - **Hours used:** [More Information Needed]
231
  - **Cloud Provider:** [More Information Needed]
232
  - **Compute Region:** [More Information Needed]
233
+ - **Carbon Emitted:** [More Information Needed] -->
234
 
235
  ## Technical Specifications [optional]
236
 
237
+ <!-- ### Model Architecture and Objective -->
238
 
239
+ <!-- [More Information Needed] -->
240
 
241
  ### Compute Infrastructure
242
 
243
+ <!-- [More Information Needed] -->
244
 
245
  #### Hardware
246
 
247
+ - 1x NVIDIA RTX 6000 Ada
248
+
249
  <!-- #### Software
250
 
251
  [More Information Needed]