Severian
/

Jamba-Hercules

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Severian commited on Apr 1

Commit

041c2f1

•

1 Parent(s): dd66d6a

Update README.md

Files changed (1) hide show

README.md +8 -7

README.md CHANGED Viewed

@@ -14,7 +14,8 @@ pipeline_tag: text-generation
 # This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
-# I've unfortunately gone way over budget and spent a significant amount of money over the past few days trying to figure the best way to fine-tune Jamba. New iterations may be sparse until Jamba is coverted to MLX or I find buried treasure somewhere. If you've downloaded it, feel free to provde any feedback so I can improve on the next training cycle! Thanks for checking it out.
  *There's been limited testing so no example outputs yet*
@@ -41,7 +42,7 @@ lora_config = LoraConfig(
     r=8,
     lora_alpha=16,
     target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
-    lora_dropout=0.2,
     task_type="CAUSAL_LM",
     bias="none"
 )
@@ -54,19 +55,19 @@ trainer = SFTTrainer(
     tokenizer=tokenizer,
     args=TrainingArguments(
         num_train_epochs=1,
-        lr_scheduler_type='linear',
-        learning_rate=2e-5,
         per_device_train_batch_size=1,
         gradient_accumulation_steps=8,
         gradient_checkpointing=True,
         warmup_steps=10,
-        weight_decay=0.2,
         fp16=not torch.cuda.is_bf16_supported(),
         bf16=torch.cuda.is_bf16_supported(),
         logging_steps=1,
-        save_steps=100,
         output_dir="outputs",
-        optim="paged_adamw_8bit",
         seed=42,
     ),
 )

 # This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
+---
+# New training underway! Thanks to the generous insights provided by **lightblue/Jamba-v0.1-chat-multilingual**, the new training is going much better. We should hopefully have a decently trained Jamaba-Open-Hermes model for general use and experimentation.
  *There's been limited testing so no example outputs yet*
     r=8,
     lora_alpha=16,
     target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
+    lora_dropout=0.05,
     task_type="CAUSAL_LM",
     bias="none"
 )
     tokenizer=tokenizer,
     args=TrainingArguments(
         num_train_epochs=1,
+        lr_scheduler_type='cosine',
+        learning_rate=0.0002,
         per_device_train_batch_size=1,
         gradient_accumulation_steps=8,
         gradient_checkpointing=True,
         warmup_steps=10,
+        weight_decay=0.01,
         fp16=not torch.cuda.is_bf16_supported(),
         bf16=torch.cuda.is_bf16_supported(),
         logging_steps=1,
+        save_steps=200,
         output_dir="outputs",
+        optim="adamw_8bit",
         seed=42,
     ),
 )