Jellywibble
/

dalio-principles-pretrain-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

dalio-principles-pretrain-v2 / README.md

Jellywibble's picture

Update README.md

906490e almost 2 years ago

|

698 Bytes

	---
	tags:
	- text-generation
	library_name: pytorch
	---

	## Model description
	Based off facebook/opt-30b model, finetuned on chucked Dalio responses

	## Dataset Used
	Jellywibble/dalio-pretrain-book-dataset-v2

	## Training Parameters
	- Deepspeed on 4xA40 GPUs
	- Ensuring EOS token `<s>` appears only at the beginning of each chunk
	- Gradient Accumulation steps = 1 (Effective batch size of 4)
	- 3e-6 Learning Rate, AdamW optimizer
	- Block size of 800
	- Trained for 1 Epoch (additional epochs yielded worse Hellaswag result)

	## Metrics
	- Hellaswag Perplexity: 30.2
	- Eval accuracy: 49.8%
	- Eval loss: 2.283
	- wandb run: https://wandb.ai/jellywibble/huggingface/runs/2vtr39rk?workspace=user-jellywibble