A 1B parameter decoder-only Transformer model trained on code using a causal-masked objective, which allows inserting/infilling code as well as standard left-to-right generation.
For more information, see our:
A larger, 6B, parameter model is also available at facebook/incoder-6B.
transformers. Our model requires HF's tokenizers >= 0.12.1, due to changes in the pretokenizer.
pip install torch pip install "tokenizers>=0.12.1" pip install transformers
See https://github.com/dpfried/incoder for example code.
model = AutoModelForCausalLM.from_pretrained("facebook/incoder-1B")
tokenizer = AutoTokenizer.from_pretrained("facebook/incoder-1B")
(Note: the incoder-1B and incoder-6B tokenizers are identical, so 'facebook/incoder-6B' could also be used.)
tokenizer.decode, it's important to pass
clean_up_tokenization_spaces=False to avoid removing spaces after punctuation. For example:
tokenizer.decode(tokenizer.encode("from ."), clean_up_tokenization_spaces=False)
(Note: encoding prepends the
<|endoftext|> token, as this marks the start of a document to our model. This token can be removed from the decoded output by passing
The model was developed by Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer and Mike Lewis.
Thanks to Lucile Saulnier, Leandro von Werra, Nicolas Patry, Suraj Patil, Omar Sanseviero, and others at HuggingFace for help with the model release, and to Naman Goyal and Stephen Roller for the code our demo was based on!
- Downloads last month