Blackroot
/

Llama-3-8B-Abomination-LORA

Model card Files Files and versions Community

Llama-3-8B-Abomination-LORA / README.md

Blackroot's picture

Update README.md

1567410 verified 5 months ago

|

history blame contribute delete

2.49 kB

	Experimental model focused on RP and storytelling. This method attempts to bring some of the intrigue and style of the base model back into the instruct model.

	This is a model trained in four stages (Use with Llama-8B-Instruct or Llama-8B-Instruct abliterations)


	Base Model -- 1 Gig of semi-structured pretraining data (Uniform distribution centered around 4096 ctx length, b/w 512-8192)
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/637f3b03932a61b89aefbf5c/hpdbVRrM1yt65-gNtRIfT.png)
	- Base pretraining phase 1 (Constant LR, text completion -- 20,000 steps 2/3 epoch)
	- Base pretraining phase 2 (Cosine LR, text completion -- 10,000 steps 1/3 epoch)


	Merge LORA into instruct model -- 100 MB of structured story-instruct data (All samples attempt to be near 8192 ctx fullsize instructions)
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/637f3b03932a61b89aefbf5c/V1Jf07k8JdI0_OzIDc7FF.png)
	- Story-instruct tune phase 1 (Constant LR, ~1250 steps, 1 epoch)
	- Story-instruct tune phase 2 (Cosine LR, ~1250 steps, 1 epoch)

	Trained using <https://github.com/unslothai/unsloth>
	Rough script:
	```python
	model = FastLanguageModel.get_peft_model(
	model,
	r = 64,
	target_modules = ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
	lora_alpha = 32,
	lora_dropout = 0.05, # 0 for base pretraining
	bias = "none",
	use_gradient_checkpointing = "unsloth",
	random_state = 3407,
	max_seq_length = max_seq_length,
	use_rslora = True,
	loftq_config = None,
	)

	trainer = SFTTrainer(
	model = model,
	train_dataset = train_dataset,
	dataset_text_field = "text",
	max_seq_length = max_seq_length,
	tokenizer = tokenizer,
	args = TrainingArguments(
	per_device_train_batch_size = 2,
	warmup_steps = 45,
	num_train_epochs=2, #1 for base-pretraining
	fp16 = not torch.cuda.is_bf16_supported(),
	bf16 = torch.cuda.is_bf16_supported(),
	logging_steps = 15,
	logging_dir="logs",
	report_to="tensorboard",
	output_dir = "outputs",
	save_strategy=IntervalStrategy.STEPS,
	save_steps=100,
	save_total_limit=30,
	optim = "adamw_torch_fused",
	lr_scheduler_type="cosine", # <- Changed over time
	learning_rate=5e-5,
	weight_decay=0.10, # .15 for base pretraining
	adam_beta1=0.88, # .9 for base pretraining
	adam_beta2=0.99, # .999 for base pretraining
	),
	)
	```