Edit model card

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa

Notes.

Run with Deepspeed,

pip install datasets
pip install deepspeed
# Download the deepspeed config file
wget https://huggingface.co/microsoft/deberta-v2-xxlarge/resolve/main/ds_config.json -O ds_config.json
export TASK_NAME=mnli
output_dir="ds_results"
num_gpus=8
batch_size=8
python -m torch.distributed.launch --nproc_per_node=${num_gpus} \\
  run_glue.py \\
  --model_name_or_path microsoft/deberta-v2-xxlarge \\
  --task_name $TASK_NAME \\
  --do_train \\
  --do_eval \\
  --max_seq_length 256 \\
  --per_device_train_batch_size ${batch_size} \\
  --learning_rate 3e-6 \\
  --num_train_epochs 3 \\
  --output_dir $output_dir \\
  --overwrite_output_dir \\
  --logging_steps 10 \\
  --logging_dir $output_dir \\
  --deepspeed ds_config.json

You can also run with --sharded_ddp

cd transformers/examples/text-classification/
export TASK_NAME=mnli
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge   \\
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 256   --per_device_train_batch_size 8   \\
--learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

Citation

If you find DeBERTa useful for your work, please cite the following paper:

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}
Downloads last month
12
Safetensors
Model size
435M params
Tensor type
I64
·
F32
·