--- pipeline_tag: text-generation inference: true widget: - text: 'def print_hello_world():' example_title: Hello world group: Python license: bigcode-openrail-m datasets: - bigcode/commitpackft - bigcode/oasst-octopack metrics: - code_eval library_name: transformers tags: - code model-index: - name: OctoCoder results: - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Python metrics: - name: pass@1 type: pass@1 value: 46.2 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize JavaScript metrics: - name: pass@1 type: pass@1 value: 39.2 verified: false --- ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true) # OctoCoder Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).
Model (↓) Python JavaScript Java Go C++ Rust Avg.

HumanEvalFix

Non-permissive models

WizardCoder 31.8 29.5 12.7 30.4 18.7 13.0 22.7
GPT-4 47.0 48.2 50.0 50.6 47.6 43.3 47.8

Permissive models

InstructCodeT5+ 2.7 1.2 4.3 2.1 0.2 0.5 1.8
BLOOMZ+ 16.6 15.5 15.2 16.4 6.7 5.7 12.5
StarChat-β 18.1 18.1 24.1 18.1 8.2 3.6 11.2
CodeGeeX2* 15.9 14.7 18.0 13.6 4.3 6.1 12.1
StarCoder 8.7 15.7 13.3 20.1 15.6 6.7 13.4
OctoGeeX* 28.1 27.7 30.4 27.6 22.9 9.6 24.4
OctoCoder 30.2 28.4 30.6 30.2 26.1 16.5 27.0

HumanEvalExplain


Non-permissive models

WizardCoder 32.5 33.0 27.4 26.7 28.2 16.9 27.5
GPT-4 64.6 57.3 51.2 58.5 38.4 42.7 52.1

Permissive models

InstructCodeT5+ 20.8 0.0 0.0 0.0 0.1 0.0 3.5
BLOOMZ+ 14.7 8.8 12.1 8.5 0.6 0.0 7.5
StarChat-β 25.4 21.5 24.5 18.4 17.6 13.2 20.1
CodeGeeX2* 0.0 0.0 0.0 0.0 0.0 0.0 0.0
StarCoder 0.0 0.0 0.0 0.0 0.0 0.0 0.0
OctoGeeX* 30.4 24.0 24.7 21.7 21.0 15.9 22.9
OctoCoder 35.1 24.5 27.3 21.1 24.1 14.8 24.5

HumanEvalSynthesize


Non-permissive models

WizardCoder 57.3 49.5 36.1 36.4 40.9 20.2 40.1
GPT-4 86.6 82.9 81.7 72.6 78.7 67.1 78.3

Permissive models

InstructCodeT5+ 37.0 18.9 17.4 9.5 19.8 0.3 17.1
BLOOMZ+ 15.6 14.8 18.4 8.4 6.5 5.5 11.5
StarChat-β 33.5 31.4 26.7 25.5 26.6 14.0 26.3
CodeGeeX2* 35.9 32.2 30.8 22.5 29.3 18.1 28.1
StarCoder 33.6 30.8 30.2 17.6 31.6 21.8 27.6
OctoGeeX* 44.7 33.8 36.9 21.9 32.3 15.7 30.9
OctoCoder 46.2 39.2 38.2 30.4 35.6 23.4 35.5
## Table of Contents 1. [Model Summary](##model-summary) 2. [Use](##use) 3. [Limitations](##limitations) 4. [Training](##training) 5. [License](##license) 6. [Citation](##citation) ## Model Summary OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper. - **Repository:** [bigcode/octopack](https://github.com/bigcode-project/octopack) - **Paper:** [TODO]() - **Languages:** 80+ Programming languages ## Use ### Intended use The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:" **Feel free to share your generations in the Community tab!** ### Generation ```python # pip install -q transformers from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigcode/octocoder" device = "cuda" # for GPU usage or "cpu" for CPU usage tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device) outputs = model.generate(inputs) print(tokenizer.decode(outputs[0])) ``` # Training ## Model - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective - **Steps:** 250k pretraining & 30 instruction tuning - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning - **Precision:** bfloat16 ## Hardware - **Pretraining:** - **GPUs:** 512 Tesla A100 - **Training time:** 24 days - **Instruction tuning:** - **GPUs:** 8 Tesla A100 - **Training time:** 4 hours ## Software - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training) - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) # Citation TODO