ltg
/

File size: 2,425 Bytes
6825678
319aff6
5c5c79c
319aff6
 
6825678
5c5c79c
6825678
319aff6
e836ed3
6dda0f3
aeafbba
5c5c79c
e836ed3
5c5c79c
 
 
 
 
6dda0f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
319aff6
5c5c79c
319aff6
5c5c79c
319aff6
5c5c79c
7e64633
e2e4880
7e64633
 
 
 
 
 
 
319aff6
 
5c5c79c
7e64633
e2e4880
7e64633
 
 
 
319aff6
5c5c79c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
language: en
tags:
- deberta
- fill-mask
license: mit
pipeline_tag: text-generation
---

# DeBERTa (1.4B) fixed version

This is [**deberta-v2-xxlarge**](https://huggingface.co/microsoft/deberta-v2-xxlarge) updated to implement the `AutoModelForCausalLM` class, enabling it to generate text. This implementation is based on our paper [**"BERTs are Generative In-Context Learners"**](https://arxiv.org/abs/2406.04823).

This repository also fixes three bugs in [the original HF implementation of DeBERTa](https://huggingface.co/microsoft/deberta-v2-xxlarge):
1. We fixed the incorrect name of the output embedding weights in the checkpoint file;
2. We fixed the implementation of the enhanced mask decoder (EMD), based on [the original GitHub repository](https://github.com/microsoft/DeBERTa);
3. We clamp the positional embeddings so that they work with long sequence lengths.

## Example code

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ltg/deberta-xxlarge-fixed", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ltg/deberta-xxlarge-fixed", trust_remote_code=True).cuda().eval()

prompt = """German: Hallo, wie geht es Ihnen heute?
English:"""
prompt = prompt.replace('\n', '\\n ')
input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.cuda()

prediction = model.generate(
    input_ids,
    num_beams=4,
    do_sample=False,
    use_cache=None,
    max_new_tokens=64,
    eos_token_id=tokenizer(".\\", add_special_tokens=False).input_ids[1:]
)
prediction = prediction[0, input_ids.size(1):]
prediction = tokenizer.decode(prediction).rstrip('\\')

# Expected output: "Hello, how are you doing today?"
print(prediction)
```


## Citation

If you find DeBERTa useful for your work, please cite the following paper:

```bibtex
@misc{samuel2024berts,
  title={{BERTs} are Generative In-Context Learners}, 
  author={David Samuel},
  year={2024},
  eprint={2406.04823},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2406.04823}
}
```

``` bibtex
@inproceedings{he2021deberta,
  title={{DeBERTa}: Decoding-enhanced {BERT} with disentangled attention},
  author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=XPZIaotutsD}
}
```