InCoder 6B
A 6B parameter decoder-only Transformer model trained on code using a causal-masked objective, which allows inserting/infilling code as well as standard left-to-right generation.
The model was trained on public open-source repositories with a permissive, non-copyleft, license (Apache 2.0, MIT, BSD-2 or BSD-3) from GitHub and GitLab, as well as StackOverflow. Repositories were primarily on Python and JavaScript, but also include code from 28 languages, as well as StackOverflow.
For more information, see:
Project site Paper Examples Demo
A smaller, 1B, parameter model is also available at facebook/incoder-1B.
Requirements
pytorch
, tokenizers
, and transformers
. Our model requires HF's tokenizers >= 0.12.1, due to changes in the pretokenizer. This version is close to release, but in the meantime you can install directly from source via pip.
pip install pytorch
pip install git+https://github.com/huggingface/tokenizers
pip install git+https://github.com/huggingface/transformers
Usage
See https://github.com/dpfried/incoder for example code.
Credits
The model was developed by Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer and Mike Lewis.
Thanks to Lucile Saulnier, Leandro von Werra, Nicolas Patry, Suraj Patil, Omar Sanseviero, and others at HuggingFace for help with the model release, and to Naman Goyal and Stephen Roller for the code our demo was based on!