ltg
/

deberta-xxlarge-fixed

Text Generation

Model card Files Files and versions Community

deberta-xxlarge-fixed / README.md

davda54's picture

Update README.md

aeafbba verified 5 months ago

|

2.43 kB

	---
	language: en
	tags:
	- deberta
	- fill-mask
	license: mit
	pipeline_tag: text-generation
	---

	# DeBERTa (1.4B) fixed version

	This is [deberta-v2-xxlarge](https://huggingface.co/microsoft/deberta-v2-xxlarge) updated to implement the `AutoModelForCausalLM` class, enabling it to generate text. This implementation is based on our paper ["BERTs are Generative In-Context Learners"](https://arxiv.org/abs/2406.04823).

	This repository also fixes three bugs in [the original HF implementation of DeBERTa](https://huggingface.co/microsoft/deberta-v2-xxlarge):
	1. We fixed the incorrect name of the output embedding weights in the checkpoint file;
	2. We fixed the implementation of the enhanced mask decoder (EMD), based on [the original GitHub repository](https://github.com/microsoft/DeBERTa);
	3. We clamp the positional embeddings so that they work with long sequence lengths.

	## Example code

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("ltg/deberta-xxlarge-fixed", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("ltg/deberta-xxlarge-fixed", trust_remote_code=True).cuda().eval()

	prompt = """German: Hallo, wie geht es Ihnen heute?
	English:"""
	prompt = prompt.replace('\n', '\\n ')
	input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.cuda()

	prediction = model.generate(
	input_ids,
	num_beams=4,
	do_sample=False,
	use_cache=None,
	max_new_tokens=64,
	eos_token_id=tokenizer(".\\", add_special_tokens=False).input_ids[1:]
	)
	prediction = prediction[0, input_ids.size(1):]
	prediction = tokenizer.decode(prediction).rstrip('\\')

	# Expected output: "Hello, how are you doing today?"
	print(prediction)
	```


	## Citation

	If you find DeBERTa useful for your work, please cite the following paper:

	```bibtex
	@misc{samuel2024berts,
	title={{BERTs} are Generative In-Context Learners},
	author={David Samuel},
	year={2024},
	eprint={2406.04823},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2406.04823}
	}
	```

	``` bibtex
	@inproceedings{he2021deberta,
	title={{DeBERTa}: Decoding-enhanced {BERT} with disentangled attention},
	author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
	booktitle={International Conference on Learning Representations},
	year={2021},
	url={https://openreview.net/forum?id=XPZIaotutsD}
	}
	```