--- pipeline_tag: text-generation inference: true widget: - text: 'def print_hello_world():' example_title: Hello world group: Python license: bigcode-openrail-m datasets: - bigcode/commitpackft - bigcode/oasst-octopack metrics: - code_eval library_name: transformers tags: - code model-index: - name: OctoCoder results: - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Python metrics: - name: pass@1 type: pass@1 value: 46.2 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize JavaScript metrics: - name: pass@1 type: pass@1 value: 39.2 verified: false --- ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true) # OctoCoder Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).

Model (↓)	Python	JavaScript	Java	Go	C++	Rust	Avg.

HumanEvalFix

Non-permissive models

WizardCoder	31.8	29.5	12.7	30.4	18.7	13.0	22.7
GPT-4	47.0	48.2	50.0	50.6	47.6	43.3	47.8

Permissive models

InstructCodeT5+^‡	2.7	1.2	4.3	2.1	0.2	0.5	1.8
BLOOMZ⁺	16.6	15.5	15.2	16.4	6.7	5.7	12.5
StarChat-β	18.1	18.1	24.1	18.1	8.2	3.6	11.2
CodeGeeX2^*	15.9	14.7	18.0	13.6	4.3	6.1	12.1
StarCoder	8.7	15.7	13.3	20.1	15.6	6.7	13.4
OctoGeeX^*	28.1	27.7	30.4	27.6	22.9	9.6	24.4
OctoCoder	30.2	28.4	30.6	30.2	26.1	16.5	27.0

HumanEvalExplain

Non-permissive models

WizardCoder	32.5	33.0	27.4	26.7	28.2	16.9	27.5
GPT-4	64.6	57.3	51.2	58.5	38.4	42.7	52.1

Permissive models

InstructCodeT5+^‡	20.8	0.0	0.0	0.0	0.1	0.0	3.5
BLOOMZ⁺	14.7	8.8	12.1	8.5	0.6	0.0	7.5
StarChat-β	25.4	21.5	24.5	18.4	17.6	13.2	20.1
CodeGeeX2^*	0.0	0.0	0.0	0.0	0.0	0.0	0.0
StarCoder	0.0	0.0	0.0	0.0	0.0	0.0	0.0
OctoGeeX^*	30.4	24.0	24.7	21.7	21.0	15.9	22.9
OctoCoder	35.1	24.5	27.3	21.1	24.1	14.8	24.5

HumanEvalSynthesize

Non-permissive models

WizardCoder	57.3	49.5	36.1	36.4	40.9	20.2	40.1
GPT-4	86.6	82.9	81.7	72.6	78.7	67.1	78.3

Permissive models

InstructCodeT5+^‡	37.0	18.9	17.4	9.5	19.8	0.3	17.1
BLOOMZ⁺	15.6	14.8	18.4	8.4	6.5	5.5	11.5
StarChat-β	33.5	31.4	26.7	25.5	26.6	14.0	26.3
CodeGeeX2^*	35.9	32.2	30.8	22.5	29.3	18.1	28.1
StarCoder	33.6	30.8	30.2	17.6	31.6	21.8	27.6
OctoGeeX^*	44.7	33.8	36.9	21.9	32.3	15.7	30.9
OctoCoder	46.2	39.2	38.2	30.4	35.6	23.4	35.5

## Table of Contents 1. [Model Summary](##model-summary) 2. [Use](##use) 3. [Limitations](##limitations) 4. [Training](##training) 5. [License](##license) 6. [Citation](##citation) ## Model Summary OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper. - **Repository:** [bigcode/octopack](https://github.com/bigcode-project/octopack) - **Paper:** [TODO]() - **Languages:** 80+ Programming languages ## Use ### Intended use The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:" **Feel free to share your generations in the Community tab!** ### Generation ```python # pip install -q transformers from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigcode/octocoder" device = "cuda" # for GPU usage or "cpu" for CPU usage tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device) outputs = model.generate(inputs) print(tokenizer.decode(outputs[0])) ``` # Training ## Model - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective - **Steps:** 250k pretraining & 30 instruction tuning - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning - **Precision:** bfloat16 ## Hardware - **Pretraining:** - **GPUs:** 512 Tesla A100 - **Training time:** 24 days - **Instruction tuning:** - **GPUs:** 8 Tesla A100 - **Training time:** 4 hours ## Software - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training) - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) # Citation TODO