pere
/

flax-bart-nb-nn

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

flax-bart-nb-nn / README.md

pere's picture

fisrt commit

e565538 over 2 years ago

|

raw history blame contribute delete

No virus

2.01 kB

	# RotoBART

	## Running the script

	### Script arguemnts

	Available model config arguments from script:
	```
	encoder_layers
	encoder_ffn_dim
	decoder_layers
	decoder_ffn_dim
	d_model
	vocab_size
	max_position_embeddings
	encoder_layerdrop
	decoder_layerdrop
	```

	Training Arguments:

	`testing` : only uses 1 batch, for testing the script

	`adafactor`: will enable adafactor, removing the command will revert to Adam

	`grad_accum`: what value for gradient accumulation to use, default is 4

	`use_bf16`: convert the model to bf16

	`colab_tpu`: if running on a colab TPU

	`use_wandb`: log using Weights & Biases (via Tensorboard)

	`save_strategy`: whether or not to save model checkpoints based on steps or epoch


	```
	python rotobart/run_dnlm_flax.py \
	--output_dir rotobart_output \
	--overwrite_output_dir \
	--dataset_path rotobart/pile.py \
	--model_name_or_path rotobart \
	--tokenizer_name ./rotobart/vocab-2/the_pile.model \
	--shuffle_buffer_size 1000 \
	--do_train --do_eval \
	--max_seq_length 1024 \
	--encoder_layers 2 \
	--decoder_layers 2 \
	--per_device_train_batch_size 2 \
	--per_device_eval_batch_size 2 \
	--logging_steps 8 \
	--num_train_steps 1000 \
	--eval_steps 1000 \
	--save_steps 1000 \
	--save_strategy steps \
	--num_eval_samples 100 \
	--warmup_steps 30 \
	--learning_rate 1e-4 \
	--use_wandb \
	--testing \
	--use_bf16 \
	--adafactor
	```

	alt

	```
	python3 run_dnlm_flax.py --output_dir rotobart_output --overwrite_output_dir --dataset_path pile.py --model_name_or_path rotobart --tokenizer_name vocab-2/the_pile.model --shuffle_buffer_size 1000 --do_train --do_eval --max_position_embeddings 2048 --max_seq_length 2048 --encoder_layers 6 --decoder_layers 6 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --logging_steps 100 --num_train_steps 50000 --eval_steps 2500 --save_steps 2500 --save_strategy steps --num_eval_samples 5000 --warmup_steps 5000 --learning_rate 1e-4 --use_wandb --use_bf16 --adafactor
	```