PocketDoc commited on
Commit
11a6744
1 Parent(s): d4a1ed5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -61,8 +61,8 @@ wandb_watch:
61
  wandb_run_id:
62
  wandb_log_model:
63
 
64
- gradient_accumulation_steps: 8
65
- micro_batch_size: 1
66
  num_epochs: 1
67
  optimizer: paged_adamw_32bit
68
  lr_scheduler: constant
@@ -72,18 +72,18 @@ learning_rate: 0.00005
72
 
73
  train_on_inputs: true
74
  group_by_length: false
75
- bf16: false
76
  fp16: false
77
- tf32: true
78
 
79
  gradient_checkpointing: false
80
  early_stopping_patience:
81
  resume_from_checkpoint:
82
- auto_resume_from_checkpoints: false
83
  local_rank:
84
  logging_steps: 1
85
  xformers_attention:
86
- flash_attention: false
87
  flash_attn_cross_entropy: false
88
  flash_attn_rms_norm: true
89
  flash_attn_fuse_qkv: false
@@ -111,7 +111,7 @@ special_tokens:
111
 
112
  # TinyMistral-StructureEvaluator
113
 
114
- This model was further trained on the epfl-llm/guidelines and JeanKaddour/minipile datasets.
115
 
116
  ## Model description
117
 
@@ -131,10 +131,10 @@ More information needed
131
 
132
  The following hyperparameters were used during training:
133
  - learning_rate: 5e-05
134
- - train_batch_size: 1
135
- - eval_batch_size: 1
136
  - seed: 42
137
- - gradient_accumulation_steps: 8
138
  - total_train_batch_size: 8
139
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
140
  - lr_scheduler_type: constant
 
61
  wandb_run_id:
62
  wandb_log_model:
63
 
64
+ gradient_accumulation_steps: 2
65
+ micro_batch_size: 4
66
  num_epochs: 1
67
  optimizer: paged_adamw_32bit
68
  lr_scheduler: constant
 
72
 
73
  train_on_inputs: true
74
  group_by_length: false
75
+ bf16: true
76
  fp16: false
77
+ tf32: false
78
 
79
  gradient_checkpointing: false
80
  early_stopping_patience:
81
  resume_from_checkpoint:
82
+ auto_resume_from_checkpoints: True
83
  local_rank:
84
  logging_steps: 1
85
  xformers_attention:
86
+ flash_attention: true
87
  flash_attn_cross_entropy: false
88
  flash_attn_rms_norm: true
89
  flash_attn_fuse_qkv: false
 
111
 
112
  # TinyMistral-StructureEvaluator
113
 
114
+ This model was further trained on the epfl-llm/guidelines and JeanKaddour/minipile datasets for 1 epoch.
115
 
116
  ## Model description
117
 
 
131
 
132
  The following hyperparameters were used during training:
133
  - learning_rate: 5e-05
134
+ - train_batch_size: 4
135
+ - eval_batch_size: 4
136
  - seed: 42
137
+ - gradient_accumulation_steps: 2
138
  - total_train_batch_size: 8
139
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
140
  - lr_scheduler_type: constant