--- license: mit inference: parameters: max_length: 1024 widget: - text: "a b : ℕ\n⊢ a + b = b + a" example_title: "Example" --- [LeanDojo: Theorem Proving with Retrieval-Augmented Language Models](https://arxiv.org/abs/xxxx.xxxxx) Under review, NeurIPS (Datasets and Benchmarks Track), 2023 [Kaiyu Yang](https://yangky11.github.io/), [Aidan Swope](https://aidanswope.com/about), [Alex Gu](https://minimario.github.io/), [Rahul Chalamala](https://www.linkedin.com/in/rchalamala), [Peiyang Song](https://www.linkedin.com/in/peiyang-song-3279b3251/), [Shixing Yu](https://billysx.github.io/), [Saad Godil](https://www.linkedin.com/in/saad-godil-9728353/), [Ryan Prenger](https://www.linkedin.com/in/ryan-prenger-18797ba1/), [Anima Anandkumar](http://tensorlab.cms.caltech.edu/users/anima/) ```bibtex @inproceedings{yang2023leandojo, title={{LeanDojo}: Theorem Proving with Retrieval-Augmented Language Models}, author={Yang, Kaiyu and Swope, Aidan and Gu, Alex and Chalamala, Rahul and Song, Peiyang and Yu, Shixing and Godil, Saad and Prenger, Ryan and Anandkumar, Anima}, booktitle={Neural Information Processing Systems (NeurIPS)}, year={2023} } ``` Please visit [LeanDojo Website](https://leandojo.org/) for details. ## Input Format The model's input consists of retrieved premises concatenated with the current proof state and truncated to 2300 UTF-8 bytes. The proof state is formatted by Lean's pretty printer, the same as the input format of our [model w/o retrieval](https://huggingface.co/kaiyuy/leandojo-lean3-tacgen-byt5-small). Retrieved premises are in the form of Lean code, except that proofs are removed and fully qualified names (marked by `...`) are used. Please see the example on the right.