File size: 728 Bytes
8953704
 
6fbaed6
36fdd94
8953704
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
906490e
260ec30
906490e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
tags:
- text-generation
library_name: transformers
---

## Model description
Based off facebook/opt-30b model, finetuned on chucked Dalio responses

## Dataset Used
Jellywibble/dalio-pretrain-book-dataset-v2

## Training Parameters
- Deepspeed on 4xA40 GPUs
- Ensuring EOS token `<s>` appears only at the beginning of each chunk
- Gradient Accumulation steps = 1 (Effective batch size of 4)
- 3e-6 Learning Rate, AdamW optimizer
- Block size of 800
- Trained for 1 Epoch (additional epochs yielded worse Hellaswag result)

## Metrics
- Hellaswag Perplexity: 30.2
- Eval accuracy: 49.8%
- Eval loss: 2.283
- Checkpoint 16 uploaded
- wandb run: https://wandb.ai/jellywibble/huggingface/runs/2vtr39rk?workspace=user-jellywibble