|
--- |
|
license: mit |
|
datasets: |
|
- the_pile_openwebtext2 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/NathanGodey/headless-lm |
|
- **Paper:** https://arxiv.org/abs/2309.08351 |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
This model is a Pythia-70m architecture trained on OpenWebText-2 using the Contrastive Weight Tying objective, and briefly fine-tuned for language generation on the same dataset. |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@misc{godey2023headless, |
|
title={Headless Language Models: Learning without Predicting with Contrastive Weight Tying}, |
|
author={Nathan Godey and Éric de la Clergerie and Benoît Sagot}, |
|
year={2023}, |
|
eprint={2309.08351}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
|
|
## Contact |
|
|
|
nathan.godey@inria.fr |