Edit model card

Model description

Based off facebook/opt-30b model, finetuned on chucked Dalio responses

Dataset Used

Jellywibble/dalio-pretrain-book-dataset-v2

Training Parameters

  • Deepspeed on 4xA40 GPUs
  • Ensuring EOS token <s> appears only at the beginning of each chunk
  • Gradient Accumulation steps = 1 (Effective batch size of 4)
  • 3e-6 Learning Rate, AdamW optimizer
  • Block size of 800
  • Trained for 1 Epoch (additional epochs yielded worse Hellaswag result)

Metrics

Downloads last month
3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.