Edit model card

Model Description

Pre-training on cleaned version of Principles

  • removing numeric references to footnotes
  • removing numeric counts, i.e. 1) ... 2) ... 3) ...
  • correcting gramma, i.e. full stops must be followed by a space
  • finetuning OPT-30B model on the dataset above
  • Dataset location: Jellywibble/dalio-principles-cleaned-v3

Metrics

  • Checkpoint 8 served
  • Hellaswag Perplexity: 30.65
  • 2.289 eval loss

wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble

Model Parameters

Trained on 4xA40, effective batchsize = 8

  • base_model_name facebook/opt-30b
  • dataset_name Jellywibble/dalio-principles-cleaned-v3
  • block_size 1024
  • gradient_accumulation_steps 2
  • per_device_train_batch_size 1
  • seed 2
  • num_train_epochs 1
  • learning_rate 3e-6

Notes

  • It is important for the effective batch size to be at least 8
  • Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics
Downloads last month
12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.