Severian commited on
Commit
2402bf2
1 Parent(s): 6ae470a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -15,7 +15,6 @@ pipeline_tag: text-generation
15
  # Current version works but it is very particular about having the right ChatML format and settings. Jamba has been somewhat difficult and expensive to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
16
 
17
  ---
18
- # New training underway! Thanks to the generous insights provided by **lightblue/Jamba-v0.1-chat-multilingual**, the new training is going much better. We should hopefully have a decently trained Jamaba-Open-Hermes model for general use and experimentation.
19
 
20
  ## Example Output:
21
 
@@ -96,6 +95,7 @@ print(tokenizer.batch_decode(outputs)[0])
96
 
97
  ### **Open-Hermes-2.0:
98
 
 
99
  *1000 Steps (5 hours x A100)*
100
  *Final Loss: 3.48*
101
 
@@ -121,8 +121,8 @@ trainer = SFTTrainer(
121
  tokenizer=tokenizer,
122
  args=TrainingArguments(
123
  num_train_epochs=1,
124
- lr_scheduler_type='cosine',
125
- learning_rate=0.0002,
126
  per_device_train_batch_size=1,
127
  gradient_accumulation_steps=8,
128
  gradient_checkpointing=True,
 
15
  # Current version works but it is very particular about having the right ChatML format and settings. Jamba has been somewhat difficult and expensive to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
16
 
17
  ---
 
18
 
19
  ## Example Output:
20
 
 
95
 
96
  ### **Open-Hermes-2.0:
97
 
98
+ **FIRST TEST:**
99
  *1000 Steps (5 hours x A100)*
100
  *Final Loss: 3.48*
101
 
 
121
  tokenizer=tokenizer,
122
  args=TrainingArguments(
123
  num_train_epochs=1,
124
+ lr_scheduler_type='linear',
125
+ learning_rate=0.001,
126
  per_device_train_batch_size=1,
127
  gradient_accumulation_steps=8,
128
  gradient_checkpointing=True,