Edit model card

Model Sources

Model Architecture and Objective

This model is a Pythia-70m architecture trained on OpenWebText-2 using the Contrastive Weight Tying objective.

Software

[More Information Needed]

Citation

BibTeX:

@misc{godey2023headless,
      title={Headless Language Models: Learning without Predicting with Contrastive Weight Tying}, 
      author={Nathan Godey and Éric de la Clergerie and Benoît Sagot},
      year={2023},
      eprint={2309.08351},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Model Card Authors

Nathan Godey Eric de la Clergerie Benoît Sagot

Model Card Contact

nathan.godey@inria.fr

Downloads last month
38
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train nthngdy/headless-pythia-owt2-70m-raw