julien-c HF staff commited on
Commit
14eea0d
1 Parent(s): 23a4f8f

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/microsoft/deberta-base/README.md

Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ thumbnail: https://huggingface.co/front/thumbnails/microsoft.png
3
+ license: mit
4
+ ---
5
+
6
+ ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention
7
+
8
+ [DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.
9
+
10
+ Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
11
+
12
+
13
+ #### Fine-tuning on NLU tasks
14
+
15
+ We present the dev results on SQuAD 1.1/2.0 and MNLI tasks.
16
+
17
+ | Model | SQuAD 1.1 | SQuAD 2.0 | MNLI-m |
18
+ |-------------------|-----------|-----------|--------|
19
+ | RoBERTa-base | 91.5/84.6 | 83.7/80.5 | 87.6 |
20
+ | XLNet-Large | -/- | -/80.2 | 86.8 |
21
+ | **DeBERTa-base** | 93.1/87.2 | 86.2/83.1 | 88.8 |
22
+
23
+ ### Citation
24
+
25
+ If you find DeBERTa useful for your work, please cite the following paper:
26
+
27
+ ``` latex
28
+ @misc{he2020deberta,
29
+ title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention},
30
+ author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
31
+ year={2020},
32
+ eprint={2006.03654},
33
+ archivePrefix={arXiv},
34
+ primaryClass={cs.CL}
35
+ }
36
+ ```