File size: 1,718 Bytes
151795f
d97087d
 
 
cf212fc
76df626
 
50d6392
76df626
0c4db1f
50d6392
cf212fc
 
151795f
cb371e9
8920023
cb371e9
 
 
 
 
 
 
 
 
 
 
 
6173efc
cb371e9
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[InCoder](https://huggingface.co/facebook/incoder-6B) uses a decoder-only Transformer with Causal Masking objective, to train a left-to-right language model to fill in masked token segments, with a context length of 2048.

|Model | # parameters |
|   -   |   -  | 
| Decoder |1.3B |
| Decoder |6.7B |


[Causal Masking objective](https://arxiv.org/abs/2201.07520) is a hybrid approach of Causal and Masked language models, "it combines the benefit of per-token generation with optional bi-directionality specifically tailored to prompting".
During the training of InCoder, spans of code were randomly masked and moved to the end of each file, which allows for bidirectional context. Figure 1 from InCoder [paper](https://arxiv.org/pdf/2204.05999.pdf) illustrates the training process.

So in addition to program synthesis (via left-to-right generation), InCoder can also perform editing (via infilling). The model gives promising results in some zero-shot code infilling tasks such as type prediction, variable re-naming and comment generation.

In the code generation demo, at the end of the blog, we use InCoder 1.3B.

You can load the model and tokenizer directly from [`transformers`](https://huggingface.co/docs/transformers/index):

```python
from transformers import AutoTokenizer, AutoModelWithLMHead
  
tokenizer = AutoTokenizer.from_pretrained("facebook/incoder-6B")
model = AutoModelWithLMHead.from_pretrained("facebook/incoder-6B")

inputs = tokenizer("def hello_world():", return_tensors="pt")
outputs = model(**inputs)

```

Or you can use a `pipeline`:

```python
from transformers import pipeline

pipe = pipeline("text-generation", model="facebook/incoder-6B")
outputs = pipe("def hello_world():")
```