Jellywibble
/

dalio-principles-pretrain-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Model description

Based off facebook/opt-30b model, finetuned on chucked Dalio responses

Dataset Used

Jellywibble/dalio-pretrain-book-dataset-v2

Training Parameters

Deepspeed on 4xA40 GPUs
Ensuring EOS token <s> appears only at the beginning of each chunk
Gradient Accumulation steps = 1 (Effective batch size of 4)
3e-6 Learning Rate, AdamW optimizer
Block size of 800
Trained for 1 Epoch (additional epochs yielded worse Hellaswag result)

Metrics

Hellaswag Perplexity: 30.2
Eval accuracy: 49.8%
Eval loss: 2.283
Checkpoint 16 uploaded
wandb run: https://wandb.ai/jellywibble/huggingface/runs/2vtr39rk?workspace=user-jellywibble

Downloads last month: 3

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.