File size: 2,014 Bytes
e565538
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# RotoBART

## Running the script

### Script arguemnts

Available model config arguments from script:
```
encoder_layers
encoder_ffn_dim
decoder_layers
decoder_ffn_dim
d_model
vocab_size
max_position_embeddings
encoder_layerdrop
decoder_layerdrop
```

Training Arguments:

`testing` : only uses 1 batch, for testing the script

`adafactor`: will enable adafactor, removing the command will revert to Adam

`grad_accum`: what value for gradient accumulation to use, default is 4

`use_bf16`: convert the model to bf16

`colab_tpu`: if running on a colab TPU

`use_wandb`: log using Weights & Biases (via Tensorboard)

`save_strategy`: whether or not to save model checkpoints based on steps or epoch


```
python rotobart/run_dnlm_flax.py \
  --output_dir rotobart_output \
  --overwrite_output_dir \
  --dataset_path rotobart/pile.py \
  --model_name_or_path rotobart \
  --tokenizer_name ./rotobart/vocab-2/the_pile.model \
  --shuffle_buffer_size 1000 \
  --do_train --do_eval \
  --max_seq_length 1024 \
  --encoder_layers 2 \
  --decoder_layers 2 \
  --per_device_train_batch_size 2 \
  --per_device_eval_batch_size 2 \
  --logging_steps 8 \
  --num_train_steps 1000 \
  --eval_steps 1000 \
  --save_steps 1000 \
  --save_strategy steps \
  --num_eval_samples 100 \
  --warmup_steps 30 \
  --learning_rate 1e-4 \
  --use_wandb \
  --testing \
  --use_bf16 \
  --adafactor
```

alt

```
python3 run_dnlm_flax.py   --output_dir rotobart_output   --overwrite_output_dir   --dataset_path pile.py   --model_name_or_path rotobart   --tokenizer_name vocab-2/the_pile.model   --shuffle_buffer_size 1000   --do_train --do_eval --max_position_embeddings 2048  --max_seq_length 2048   --encoder_layers 6   --decoder_layers 6   --per_device_train_batch_size 1   --per_device_eval_batch_size 1   --logging_steps 100   --num_train_steps 50000   --eval_steps 2500   --save_steps 2500   --save_strategy steps   --num_eval_samples 5000   --warmup_steps 5000   --learning_rate 1e-4   --use_wandb   --use_bf16   --adafactor
```