Update README.md
Browse files
README.md
CHANGED
@@ -11,8 +11,7 @@ license: mit
|
|
11 |
|
12 |
Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
|
13 |
|
14 |
-
This is the DeBERTa V2
|
15 |
-
|
16 |
|
17 |
### Fine-tuning on NLU tasks
|
18 |
|
@@ -36,8 +35,8 @@ We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.
|
|
36 |
```bash
|
37 |
cd transformers/examples/text-classification/
|
38 |
export TASK_NAME=mrpc
|
39 |
-
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge
|
40 |
-
--task_name $TASK_NAME --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 4
|
41 |
--learning_rate 3e-6 --num_train_epochs 3 --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
|
42 |
```
|
43 |
|
|
|
11 |
|
12 |
Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
|
13 |
|
14 |
+
This is the DeBERTa V2 xlarge model with 24 layers, 1536 hidden size. The total parameters are 900M and it is trained with 160GB raw data.
|
|
|
15 |
|
16 |
### Fine-tuning on NLU tasks
|
17 |
|
|
|
35 |
```bash
|
36 |
cd transformers/examples/text-classification/
|
37 |
export TASK_NAME=mrpc
|
38 |
+
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge \\\\
|
39 |
+
--task_name $TASK_NAME --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 4 \\\\
|
40 |
--learning_rate 3e-6 --num_train_epochs 3 --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
|
41 |
```
|
42 |
|