flax-bart-nb-nn / README.md
pere's picture
fisrt commit
e565538
# RotoBART
## Running the script
### Script arguemnts
Available model config arguments from script:
```
encoder_layers
encoder_ffn_dim
decoder_layers
decoder_ffn_dim
d_model
vocab_size
max_position_embeddings
encoder_layerdrop
decoder_layerdrop
```
Training Arguments:
`testing` : only uses 1 batch, for testing the script
`adafactor`: will enable adafactor, removing the command will revert to Adam
`grad_accum`: what value for gradient accumulation to use, default is 4
`use_bf16`: convert the model to bf16
`colab_tpu`: if running on a colab TPU
`use_wandb`: log using Weights & Biases (via Tensorboard)
`save_strategy`: whether or not to save model checkpoints based on steps or epoch
```
python rotobart/run_dnlm_flax.py \
--output_dir rotobart_output \
--overwrite_output_dir \
--dataset_path rotobart/pile.py \
--model_name_or_path rotobart \
--tokenizer_name ./rotobart/vocab-2/the_pile.model \
--shuffle_buffer_size 1000 \
--do_train --do_eval \
--max_seq_length 1024 \
--encoder_layers 2 \
--decoder_layers 2 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--logging_steps 8 \
--num_train_steps 1000 \
--eval_steps 1000 \
--save_steps 1000 \
--save_strategy steps \
--num_eval_samples 100 \
--warmup_steps 30 \
--learning_rate 1e-4 \
--use_wandb \
--testing \
--use_bf16 \
--adafactor
```
alt
```
python3 run_dnlm_flax.py --output_dir rotobart_output --overwrite_output_dir --dataset_path pile.py --model_name_or_path rotobart --tokenizer_name vocab-2/the_pile.model --shuffle_buffer_size 1000 --do_train --do_eval --max_position_embeddings 2048 --max_seq_length 2048 --encoder_layers 6 --decoder_layers 6 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --logging_steps 100 --num_train_steps 50000 --eval_steps 2500 --save_steps 2500 --save_strategy steps --num_eval_samples 5000 --warmup_steps 5000 --learning_rate 1e-4 --use_wandb --use_bf16 --adafactor
```