Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,6 @@ pipeline_tag: text-generation
|
|
15 |
# Current version works but it is very particular about having the right ChatML format and settings. Jamba has been somewhat difficult and expensive to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
|
16 |
|
17 |
---
|
18 |
-
# New training underway! Thanks to the generous insights provided by **lightblue/Jamba-v0.1-chat-multilingual**, the new training is going much better. We should hopefully have a decently trained Jamaba-Open-Hermes model for general use and experimentation.
|
19 |
|
20 |
## Example Output:
|
21 |
|
@@ -96,6 +95,7 @@ print(tokenizer.batch_decode(outputs)[0])
|
|
96 |
|
97 |
### **Open-Hermes-2.0:
|
98 |
|
|
|
99 |
*1000 Steps (5 hours x A100)*
|
100 |
*Final Loss: 3.48*
|
101 |
|
@@ -121,8 +121,8 @@ trainer = SFTTrainer(
|
|
121 |
tokenizer=tokenizer,
|
122 |
args=TrainingArguments(
|
123 |
num_train_epochs=1,
|
124 |
-
lr_scheduler_type='
|
125 |
-
learning_rate=0.
|
126 |
per_device_train_batch_size=1,
|
127 |
gradient_accumulation_steps=8,
|
128 |
gradient_checkpointing=True,
|
|
|
15 |
# Current version works but it is very particular about having the right ChatML format and settings. Jamba has been somewhat difficult and expensive to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
|
16 |
|
17 |
---
|
|
|
18 |
|
19 |
## Example Output:
|
20 |
|
|
|
95 |
|
96 |
### **Open-Hermes-2.0:
|
97 |
|
98 |
+
**FIRST TEST:**
|
99 |
*1000 Steps (5 hours x A100)*
|
100 |
*Final Loss: 3.48*
|
101 |
|
|
|
121 |
tokenizer=tokenizer,
|
122 |
args=TrainingArguments(
|
123 |
num_train_epochs=1,
|
124 |
+
lr_scheduler_type='linear',
|
125 |
+
learning_rate=0.001,
|
126 |
per_device_train_batch_size=1,
|
127 |
gradient_accumulation_steps=8,
|
128 |
gradient_checkpointing=True,
|