|
# RotoBART |
|
|
|
## Running the script |
|
|
|
### Script arguemnts |
|
|
|
Available model config arguments from script: |
|
``` |
|
encoder_layers |
|
encoder_ffn_dim |
|
decoder_layers |
|
decoder_ffn_dim |
|
d_model |
|
vocab_size |
|
max_position_embeddings |
|
encoder_layerdrop |
|
decoder_layerdrop |
|
``` |
|
|
|
Training Arguments: |
|
|
|
`testing` : only uses 1 batch, for testing the script |
|
|
|
`adafactor`: will enable adafactor, removing the command will revert to Adam |
|
|
|
`grad_accum`: what value for gradient accumulation to use, default is 4 |
|
|
|
`use_bf16`: convert the model to bf16 |
|
|
|
`colab_tpu`: if running on a colab TPU |
|
|
|
`use_wandb`: log using Weights & Biases (via Tensorboard) |
|
|
|
`save_strategy`: whether or not to save model checkpoints based on steps or epoch |
|
|
|
|
|
``` |
|
python rotobart/run_dnlm_flax.py \ |
|
--output_dir rotobart_output \ |
|
--overwrite_output_dir \ |
|
--dataset_path rotobart/pile.py \ |
|
--model_name_or_path rotobart \ |
|
--tokenizer_name ./rotobart/vocab-2/the_pile.model \ |
|
--shuffle_buffer_size 1000 \ |
|
--do_train --do_eval \ |
|
--max_seq_length 1024 \ |
|
--encoder_layers 2 \ |
|
--decoder_layers 2 \ |
|
--per_device_train_batch_size 2 \ |
|
--per_device_eval_batch_size 2 \ |
|
--logging_steps 8 \ |
|
--num_train_steps 1000 \ |
|
--eval_steps 1000 \ |
|
--save_steps 1000 \ |
|
--save_strategy steps \ |
|
--num_eval_samples 100 \ |
|
--warmup_steps 30 \ |
|
--learning_rate 1e-4 \ |
|
--use_wandb \ |
|
--testing \ |
|
--use_bf16 \ |
|
--adafactor |
|
``` |
|
|
|
alt |
|
|
|
``` |
|
python3 run_dnlm_flax.py --output_dir rotobart_output --overwrite_output_dir --dataset_path pile.py --model_name_or_path rotobart --tokenizer_name vocab-2/the_pile.model --shuffle_buffer_size 1000 --do_train --do_eval --max_position_embeddings 2048 --max_seq_length 2048 --encoder_layers 6 --decoder_layers 6 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --logging_steps 100 --num_train_steps 50000 --eval_steps 2500 --save_steps 2500 --save_strategy steps --num_eval_samples 5000 --warmup_steps 5000 --learning_rate 1e-4 --use_wandb --use_bf16 --adafactor |
|
``` |
|
|