---
pipeline_tag: text-generation
inference: true
widget:
- text: 'def print_hello_world():'
  example_title: Hello world
  group: Python
license: bigscience-openrail-m
pretrain-datasets:
- books
- arxiv
- c4
- falcon-refinedweb
- wiki
- github-issues
- stack_markdown
- self-made dataset of permissive github code
datasets:
- bigcode/the-stack-dedup
- rombodawg/2XUNCENSORED_MegaCodeTraining188k
- bigcode/commitpackft
metrics:
- code_eval
library_name: transformers
tags:
- code
model-index:
- name: Refact-1.6B
  results:
  - task:
      type: text-generation
    dataset:
      type: openai_humaneval
      name: HumanEval
    metrics:
    - name: pass@1 (T=0.01)
      type: pass@1
      value: 32.0
      verified: false
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 31.5
      verified: false
    - name: pass@10 (T=0.8)
      type: pass@10
      value: 53.0
      verified: false
    - name: pass@100 (T=0.8)
      type: pass@100
      value: 76.9
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalSynthesize Python
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 35.8
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalSynthesize JavaScript
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 31.6
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalSynthesize Java
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 29.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalSynthesize Go
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalSynthesize C++
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 26.3
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalSynthesize Rust
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalSynthesize Average
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false


  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixTests Python
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 18.38
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixTests JavaScript
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 12.28
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixTests Java
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 15.12
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixTests Go
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixTests C++
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 13.17
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixTests Rust
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 2.8
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixTests Average
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false


  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixDocs Python
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 26.92
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixDocs JavaScript
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 26.85
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixDocs Java
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 30.76
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixDocs Go
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixDocs C++
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 25.94
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixDocs Rust
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 8.44
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalFixDocs Average
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false


  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalExplain Python
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 26.46
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalExplain JavaScript
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 17.86
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalExplain Java
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 20.94
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalExplain Go
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalExplain C++
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 18.78
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalExplain Rust
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: bigcode/humanevalpack
      name: HumanEvalExplain Average
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: -1
      verified: false


  - task:
      type: text-generation
    dataset:
      type: mbpp
      name: MBPP
    metrics:
    - name: pass@1 (T=0.01)
      type: pass@1
      value: 31.15
      verified: false
  - task:
      type: text-generation
    dataset:
      type: ds1000
      name: DS-1000 (Overall Completion)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 10.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (C++)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 21.61
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (C#)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 13.91
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (D)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 9.5
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Go)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 53.57
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Java)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 21.58
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Julia)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 13.75
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (JavaScript)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 26.88
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Lua)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 15.26
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (PHP)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 23.04
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Perl)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 12.1
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Python)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 29.6
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (R)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 13.77
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Ruby)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 12.68
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Racket)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 4.29
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Rust)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 19.54
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Scala)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 18.33
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Bash)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 5.7
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (Swift)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 17.68
      verified: false
  - task:
      type: text-generation
    dataset:
      type: nuprl/MultiPL-E
      name: MultiPL-HumanEval (TypeScript)
    metrics:
    - name: pass@1 (T=0.2)
      type: pass@1
      value: 25
      verified: false

language:
- en
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/643a9dd0c5f633a7fa7e804a/HkB0QYV0BbmB3ktMugbZy.png)


# Refact-1.6B

Finally, the model we started training with our [blog post](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉

After fine-tuning on generated data, it beats Replit 3b, Stability Code 3b and many other models. It almost beats
StarCoder ten times the size!


Model                 | Size          | HumanEval pass@1   | HumanEval pass@10  |
----------------------|---------------|--------------------|--------------------|
DeciCoder-1b          |   1b          |  19.1%             |                    |
<b>Refact-1.6-fim</b> | <b>1.6b</b>   |  <b>32.0%</b>      | <b>53.0%</b>       |
StableCode            |   3b          |  20.2%             | 33.8%              |
ReplitCode v1         |   3b          |  21.9%             |                    |
CodeGen2.5-multi      |   7b          |  28.4%             | 47.5%              |
CodeLlama             |   7b          |  33.5%             | 59.6%              |
StarCoder             |  15b          |  33.6%             |                    |

Likely, it's the best model for practical use in your IDE for code completion because it's smart and fast!
You can start using it right now by downloading the
[Refact plugin](https://refact.ai/). You can host the model yourself, too, using the
[open source docker container](https://github.com/smallcloudai/refact).

And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).

# It Works As a Chat

The primary application of this model is code completion (infill) in multiple programming languages.
But it works as a chat quite well.

HumanEval results using instruction following (chat) format, against models specialized for chat only:

Model                  | Size   | pass@1   | pass@10  |
-----------------------|--------|----------|----------|
<b>Refact-1.6-fim</b>  | 1.6b   |  38.4%   | 55.6%    |
StableCode-instruct    |   3b   |  26.9%   | 36.2%    |
OctoGeeX               |   6b   |  44.7%   |          |
CodeLlama-instruct     |   7b   |  34.8%   | 64.3%    |
CodeGen2.5-instruct    |   7b   |  36.2%   | 60.87    |
CodeLlama-instruct     |  13b   |  42.7%   | 71.6%    |
StarChat-β             |  15b   |  33.5%   |          |
OctoCoder              |  15b   |  46.2%   |          |


# Example

Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:

```python
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "smallcloudai/Refact-1_6B-fim"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)

prompt = '<fim_prefix>def print_hello_world():\n    """<fim_suffix>\n    print("Hello world!")<fim_middle>'

inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_length=100, temperature=0.2)
print("-"*80)
print(tokenizer.decode(outputs[0]))
```

# Chat Format

The same model works as chat (experimental).

```python
prompt_template = "<empty_output>SYSTEM {system}\n" \
                  "<empty_output>USER {query}\n" \
                  "<empty_output>ASSISTANT"
prompt = prompt_template.format(system="You are a programming assistant",
                                query="How do I sort a list in Python?")
```

# Architecture

As described in more detail in the blog post, we used:

- [ALiBi](https://arxiv.org/abs/2108.12409) based attention
- [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf)
- [Multi Query Attention](https://arxiv.org/abs/1911.02150)

We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below.


# Pretraining

For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets.
Filtering is the key to success of this model:

- We only used text in English
- Only topics related to computer science
- Applied heavy deduplication

The text to code proportion was 50:50, model trained for 1.2T tokens. 

We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so
its practical use is limited. But if you still want it, write us a message on Discord.


# Finetuning

We tested our hypothesis that chat data should boost base model performance in FIM and
regular left-to-right code completion. We found that just 15% of open
[code](https://huggingface.co/datasets/bigcode/commitpackft)
[instruction-following](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k) datasets,
that we filtered for quality, improves almost all metrics.

Additionally, to improve FIM, we observed common failure modes, and prepared a synthetic dataset based on
[The Stack dedup v1.1](https://huggingface.co/datasets/bigcode/the-stack-dedup) to address them.

There is a distribution shift between typical code on the internet, and the code you write in your IDE.
The former is likely finished, so the model tries to come up with a suggestion that makes the code complete.
You are likely to have half-written code as you work on it, there is no single addition that can repair it
fully.

In practice, model needs to have a tendency to stop after a couple of lines are added, and sometimes don't write
anything at all. We found that just giving it empty completions, single line completions, multiline
completions that end with a smaller text indent or at least a newline -- makes it much more usable. This data
was used as the rest 85% of the finetune dataset.

The final model is the result of several attempts to make it work as good as possible for code completion,
and to perform well on a wide range of metrics. The best attempt took 40B tokens.


# Limitations and Bias

The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
code comments. Its performance on non-English languages is lower, for sure.


# Model Stats

- **Architecture:** LLAMA-like model with multi-query attention
- **Objectives** Fill-in-the-Middle, Chat
- **Tokens context:** 4096
- **Pretraining tokens:** 1.2T
- **Finetuning tokens:** 40B
- **Precision:** bfloat16
- **GPUs** 64 NVidia A5000
- **Training time** 28 days


# License

The model is licensed under the BigScience OpenRAIL-M v1 license agreement


# Citation

If you are using this model, please give a link to this page.