Replicating RoBERTa-base GLUE results

#1
by marklee - opened

Original issue here: https://github.com/huggingface/transformers/issues/17885

Hello! I had originally posted this on the forums but it seems like there's not much foot traffic there, so hoping to get more visibility here.

I'm trying to replicate RoBERTa-base GLUE results as reported in the model card. The numbers in the model card look like they were copied from the paper. Has anyone made an attempt to actually match these numbers with run_glue.py? If so, what configuration was used for the trainer?

If I follow the original configs from fairseq, I am unable to match the reported numbers for RTE, CoLA, STS-B, and MRPC.

Any pointers would be much appreciated, thanks!

Facebook AI community org

maybe pinging @myleott ?

single card:

CUDA_VISIBLE_DEVICES=0

hyperparam:

--max_seq_length 128
--per_device_train_batch_size 64
--learning_rate 1e-4
--use_lora True
--r 8
--num_train_epochs 20 \

Sign up or log in to comment