Blackroot
/

Llama-3-8B-Abomination-LORA

Model card Files Files and versions Community

Blackroot commited on May 28

Commit

7e17501

•

1 Parent(s): 7141392

Update README.md

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -11,4 +11,36 @@ Base Model -- 1 Gig of semi-structured pretraining data:
 Merge LORA into instruct model -- 100 MB of structured story-instruct data:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637f3b03932a61b89aefbf5c/V1Jf07k8JdI0_OzIDc7FF.png)
 - Story-instruct tune phase 1 (Constant LR, ~1250 steps, 1 epoch)
-- Story-instruct tune phase 2 (Cosine LR, ~1250 steps, 1 epoch)

 Merge LORA into instruct model -- 100 MB of structured story-instruct data:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637f3b03932a61b89aefbf5c/V1Jf07k8JdI0_OzIDc7FF.png)
 - Story-instruct tune phase 1 (Constant LR, ~1250 steps, 1 epoch)
+- Story-instruct tune phase 2 (Cosine LR, ~1250 steps, 1 epoch)
+Trained using <https://github.com/unslothai/unsloth>
+Rough script:
+```python
+trainer = SFTTrainer(
+    model = model,
+    train_dataset = train_dataset,
+    dataset_text_field = "text",
+    max_seq_length = max_seq_length,
+    tokenizer = tokenizer,
+    args = TrainingArguments(
+        per_device_train_batch_size = 2,
+        warmup_steps = 45,
+        num_train_epochs=2,
+        fp16 = not torch.cuda.is_bf16_supported(),
+        bf16 = torch.cuda.is_bf16_supported(),
+        logging_steps = 15,
+        logging_dir="logs",
+        report_to="tensorboard",
+        output_dir = "outputs",
+        save_strategy=IntervalStrategy.STEPS,
+        save_steps=100,
+        save_total_limit=30,
+        optim = "adamw_torch_fused",
+        lr_scheduler_type="cosine", # <- Changed over time
+        learning_rate=5e-5,
+        weight_decay=0.10, # .15 for base pretraining
+        adam_beta1=0.88, # .9 for base pretraining
+        adam_beta2=0.99,  # .999 for base pretraining
+    ),
+)
+```