Severian
/

Jamba-Hercules

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Severian commited on Apr 2

Commit

2402bf2

•

1 Parent(s): 6ae470a

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -15,7 +15,6 @@ pipeline_tag: text-generation
 # Current version works but it is very particular about having the right ChatML format and settings. Jamba has been somewhat difficult and expensive to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
 ---
-# New training underway! Thanks to the generous insights provided by **lightblue/Jamba-v0.1-chat-multilingual**, the new training is going much better. We should hopefully have a decently trained Jamaba-Open-Hermes model for general use and experimentation.
 ## Example Output:
@@ -96,6 +95,7 @@ print(tokenizer.batch_decode(outputs)[0])
 ### **Open-Hermes-2.0:
 *1000 Steps (5 hours x A100)*
 *Final Loss: 3.48*
@@ -121,8 +121,8 @@ trainer = SFTTrainer(
     tokenizer=tokenizer,
     args=TrainingArguments(
         num_train_epochs=1,
-        lr_scheduler_type='cosine',
-        learning_rate=0.0002,
         per_device_train_batch_size=1,
         gradient_accumulation_steps=8,
         gradient_checkpointing=True,

 # Current version works but it is very particular about having the right ChatML format and settings. Jamba has been somewhat difficult and expensive to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
 ---
 ## Example Output:
 ### **Open-Hermes-2.0:
+**FIRST TEST:**
 *1000 Steps (5 hours x A100)*
 *Final Loss: 3.48*
     tokenizer=tokenizer,
     args=TrainingArguments(
         num_train_epochs=1,
+        lr_scheduler_type='linear',
+        learning_rate=0.001,
         per_device_train_batch_size=1,
         gradient_accumulation_steps=8,
         gradient_checkpointing=True,