--- license: mit tags: - generated_from_trainer datasets: - squad model-index: - name: run05-roberta-large-squadv1.1-sl384-ds128-e2-tbs16 results: [] --- # run05-roberta-large-squadv1.1-sl384-ds128-e2-tbs16 This model is a fine-tuned version of [roberta-large](https://huggingface.co/roberta-large) on the squad dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 16 - eval_batch_size: 64 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 2.0 - mixed_precision_training: Native AMP ### Training results ### Framework versions - Transformers 4.18.0 - Pytorch 1.11.0+cu113 - Datasets 2.1.0 - Tokenizers 0.12.1 # Train ```bash python run_qa.py \ --model_name_or_path roberta-large \ --dataset_name squad \ --do_eval \ --do_train \ --evaluation_strategy steps \ --eval_steps 500 \ --learning_rate 3e-5 \ --fp16 \ --num_train_epochs 2 \ --per_device_eval_batch_size 64 \ --per_device_train_batch_size 16 \ --max_seq_length 384 \ --doc_stride 128 \ --save_steps 1000 \ --logging_steps 1 \ --overwrite_output_dir \ --run_name $RUNID \ --output_dir $OUTDIR ``` # Eval ```bash export CUDA_VISIBLE_DEVICES=0 MODEL=vuiseng9/roberta-l-squadv1.1 OUTDIR=eval-$(basename $MODEL) WORKDIR=transformers/examples/pytorch/question-answering cd $WORKDIR nohup python run_qa.py \ --model_name_or_path $MODEL \ --dataset_name squad \ --do_eval \ --per_device_eval_batch_size 16 \ --max_seq_length 384 \ --doc_stride 128 \ --overwrite_output_dir \ --output_dir $OUTDIR 2>&1 | tee $OUTDIR/run.log & ```