JustinLin610's picture
first commit
ee21b96
|
raw
history blame
No virus
2.76 kB
# Finetuning RoBERTa on RACE tasks
### 1) Download the data from RACE website (http://www.cs.cmu.edu/~glai1/data/race/)
### 2) Preprocess RACE data:
```bash
python ./examples/roberta/preprocess_RACE.py --input-dir <input-dir> --output-dir <extracted-data-dir>
./examples/roberta/preprocess_RACE.sh <extracted-data-dir> <output-dir>
```
### 3) Fine-tuning on RACE:
```bash
MAX_EPOCH=5 # Number of training epochs.
LR=1e-05 # Peak LR for fixed LR scheduler.
NUM_CLASSES=4
MAX_SENTENCES=1 # Batch size per GPU.
UPDATE_FREQ=8 # Accumulate gradients to simulate training on 8 GPUs.
DATA_DIR=/path/to/race-output-dir
ROBERTA_PATH=/path/to/roberta/model.pt
CUDA_VISIBLE_DEVICES=0,1 fairseq-train $DATA_DIR --ddp-backend=legacy_ddp \
--restore-file $ROBERTA_PATH \
--reset-optimizer --reset-dataloader --reset-meters \
--best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
--task sentence_ranking \
--num-classes $NUM_CLASSES \
--init-token 0 --separator-token 2 \
--max-option-length 128 \
--max-positions 512 \
--shorten-method "truncate" \
--arch roberta_large \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--criterion sentence_ranking \
--optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 \
--clip-norm 0.0 \
--lr-scheduler fixed --lr $LR \
--fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
--batch-size $MAX_SENTENCES \
--required-batch-size-multiple 1 \
--update-freq $UPDATE_FREQ \
--max-epoch $MAX_EPOCH
```
**Note:**
a) As contexts in RACE are relatively long, we are using smaller batch size per GPU while increasing update-freq to achieve larger effective batch size.
b) Above cmd-args and hyperparams are tested on one Nvidia `V100` GPU with `32gb` of memory for each task. Depending on the GPU memory resources available to you, you can use increase `--update-freq` and reduce `--batch-size`.
c) The setting in above command is based on our hyperparam search within a fixed search space (for careful comparison across models). You might be able to find better metrics with wider hyperparam search.
### 4) Evaluation:
```
DATA_DIR=/path/to/race-output-dir # data directory used during training
MODEL_PATH=/path/to/checkpoint_best.pt # path to the finetuned model checkpoint
PREDS_OUT=preds.tsv # output file path to save prediction
TEST_SPLIT=test # can be test (Middle) or test1 (High)
fairseq-validate \
$DATA_DIR \
--valid-subset $TEST_SPLIT \
--path $MODEL_PATH \
--batch-size 1 \
--task sentence_ranking \
--criterion sentence_ranking \
--save-predictions $PREDS_OUT
```