Jellywibble's picture
Update README.md
a3d30c9
|
raw
history blame contribute delete
No virus
879 Bytes
metadata
tags:
  - text-generation
library_name: transformers

Model description

Based on Jellywibble/dalio-pretrained-book-bs4-seed1 which was pre-trained on the Dalio Principles Book Finetuned on handwritten conversations Jellywibble/dalio_handwritten-conversations

Dataset Used

Jellywibble/dalio_handwritten-conversations

Training Parameters

  • Deepspeed on 4xA40 GPUs
  • Ensuring EOS token <s> appears only at the beginning of each 'This is a conversation where Ray ...'
  • Gradient Accumulation steps = 1 (Effective batch size of 4)
  • 2e-6 Learning Rate, AdamW optimizer
  • Block size of 1000
  • Trained for 1 Epoch (additional epochs yielded worse Hellaswag result)

Metrics