microsoft
/

deberta-v2-xlarge

Model card Files Files and versions Community

DeBERTa commited on May 21, 2021

Commit

ad6e42c

·

1 Parent(s): f962ff9

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -11,8 +11,7 @@ license: mit
 Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
-This is the DeBERTa V2 xxlarge model with 48 layers, 1536 hidden size. Total parameters 1.5B. It's trained with 160GB data.
 ### Fine-tuning on NLU tasks
@@ -36,8 +35,8 @@ We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.
 ```bash
 cd transformers/examples/text-classification/
 export TASK_NAME=mrpc
-python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge   \\
---task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 4   \\
 --learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
 ```

 Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
+This is the DeBERTa V2 xlarge model with 24 layers, 1536 hidden size. The total parameters are 900M and it is trained with 160GB raw data.
 ### Fine-tuning on NLU tasks
 ```bash
 cd transformers/examples/text-classification/
 export TASK_NAME=mrpc
+python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge   \\\\
+--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 4   \\\\
 --learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
 ```