Edit model card

Model Sources

Model Architecture and Objective

This model is a Pythia-70m architecture trained on OpenWebText-2 using the Contrastive Weight Tying objective, and briefly fine-tuned for language generation on the same dataset.

Citation

BibTeX:

@misc{godey2023headless,
      title={Headless Language Models: Learning without Predicting with Contrastive Weight Tying}, 
      author={Nathan Godey and Éric de la Clergerie and Benoît Sagot},
      year={2023},
      eprint={2309.08351},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

nathan.godey@inria.fr

Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train nthngdy/headless-pythia-owt2-70m-ft