nthngdy
/

headless-pythia-owt2-70m-ft

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

headless-pythia-owt2-70m-ft / README.md

nthngdy's picture

Update README.md

bfb06ff about 1 year ago

|

873 Bytes

	---
	license: mit
	datasets:
	- the_pile_openwebtext2
	language:
	- en
	pipeline_tag: text-generation
	---

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/NathanGodey/headless-lm
	- Paper: https://arxiv.org/abs/2309.08351


	### Model Architecture and Objective

	This model is a Pythia-70m architecture trained on OpenWebText-2 using the Contrastive Weight Tying objective, and briefly fine-tuned for language generation on the same dataset.

	## Citation

	BibTeX:

	```bibtex
	@misc{godey2023headless,
	title={Headless Language Models: Learning without Predicting with Contrastive Weight Tying},
	author={Nathan Godey and Éric de la Clergerie and Benoît Sagot},
	year={2023},
	eprint={2309.08351},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```


	## Contact

	nathan.godey@inria.fr