Severian commited on
Commit
041c2f1
1 Parent(s): dd66d6a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -14,7 +14,8 @@ pipeline_tag: text-generation
14
 
15
  # This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
16
 
17
- # I've unfortunately gone way over budget and spent a significant amount of money over the past few days trying to figure the best way to fine-tune Jamba. New iterations may be sparse until Jamba is coverted to MLX or I find buried treasure somewhere. If you've downloaded it, feel free to provde any feedback so I can improve on the next training cycle! Thanks for checking it out.
 
18
 
19
  *There's been limited testing so no example outputs yet*
20
 
@@ -41,7 +42,7 @@ lora_config = LoraConfig(
41
  r=8,
42
  lora_alpha=16,
43
  target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
44
- lora_dropout=0.2,
45
  task_type="CAUSAL_LM",
46
  bias="none"
47
  )
@@ -54,19 +55,19 @@ trainer = SFTTrainer(
54
  tokenizer=tokenizer,
55
  args=TrainingArguments(
56
  num_train_epochs=1,
57
- lr_scheduler_type='linear',
58
- learning_rate=2e-5,
59
  per_device_train_batch_size=1,
60
  gradient_accumulation_steps=8,
61
  gradient_checkpointing=True,
62
  warmup_steps=10,
63
- weight_decay=0.2,
64
  fp16=not torch.cuda.is_bf16_supported(),
65
  bf16=torch.cuda.is_bf16_supported(),
66
  logging_steps=1,
67
- save_steps=100,
68
  output_dir="outputs",
69
- optim="paged_adamw_8bit",
70
  seed=42,
71
  ),
72
  )
 
14
 
15
  # This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
16
 
17
+ ---
18
+ # New training underway! Thanks to the generous insights provided by **lightblue/Jamba-v0.1-chat-multilingual**, the new training is going much better. We should hopefully have a decently trained Jamaba-Open-Hermes model for general use and experimentation.
19
 
20
  *There's been limited testing so no example outputs yet*
21
 
 
42
  r=8,
43
  lora_alpha=16,
44
  target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
45
+ lora_dropout=0.05,
46
  task_type="CAUSAL_LM",
47
  bias="none"
48
  )
 
55
  tokenizer=tokenizer,
56
  args=TrainingArguments(
57
  num_train_epochs=1,
58
+ lr_scheduler_type='cosine',
59
+ learning_rate=0.0002,
60
  per_device_train_batch_size=1,
61
  gradient_accumulation_steps=8,
62
  gradient_checkpointing=True,
63
  warmup_steps=10,
64
+ weight_decay=0.01,
65
  fp16=not torch.cuda.is_bf16_supported(),
66
  bf16=torch.cuda.is_bf16_supported(),
67
  logging_steps=1,
68
+ save_steps=200,
69
  output_dir="outputs",
70
+ optim="adamw_8bit",
71
  seed=42,
72
  ),
73
  )