--- pipeline_tag: text-generation inference: true widget: - text: 'def print_hello_world():' example_title: Hello world group: Python license: bigcode-openrail-m datasets: - bigcode/commitpackft - bigcode/oasst-octopack metrics: - code_eval library_name: transformers tags: - code model-index: - name: OctoCoder results: - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Python metrics: - name: pass@1 type: pass@1 value: 46.2 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize JavaScript metrics: - name: pass@1 type: pass@1 value: 39.2 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Java metrics: - name: pass@1 type: pass@1 value: 38.2 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Go metrics: - name: pass@1 type: pass@1 value: 30.4 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize C++ metrics: - name: pass@1 type: pass@1 value: 35.6 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Rust metrics: - name: pass@1 type: pass@1 value: 23.4 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Avg. metrics: - name: pass@1 type: pass@1 value: 35.5 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Python metrics: - name: pass@1 type: pass@1 value: 35.1 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain JavaScript metrics: - name: pass@1 type: pass@1 value: 24.5 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Java metrics: - name: pass@1 type: pass@1 value: 27.3 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Go metrics: - name: pass@1 type: pass@1 value: 21.1 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain C++ metrics: - name: pass@1 type: pass@1 value: 24.1 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Rust metrics: - name: pass@1 type: pass@1 value: 14.8 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Avg. metrics: - name: pass@1 type: pass@1 value: 24.5 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Python metrics: - name: pass@1 type: pass@1 value: 30.2 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix JavaScript metrics: - name: pass@1 type: pass@1 value: 28.4 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Java metrics: - name: pass@1 type: pass@1 value: 30.6 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Go metrics: - name: pass@1 type: pass@1 value: 30.2 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix C++ metrics: - name: pass@1 type: pass@1 value: 26.1 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Rust metrics: - name: pass@1 type: pass@1 value: 16.5 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Avg. metrics: - name: pass@1 type: pass@1 value: 27.0 verified: false --- ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true) # OctoCoder Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground). ## Table of Contents 1. [Model Summary](##model-summary) 2. [Use](##use) 3. [Limitations](##limitations) 4. [Training](##training) 5. [License](##license) 6. [Citation](##citation) ## Model Summary OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper. - **Repository:** [bigcode/octopack](https://github.com/bigcode-project/octopack) - **Paper:** [TODO]() - **Languages:** 80+ Programming languages ## Use ### Intended use The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:" **Feel free to share your generations in the Community tab!** ### Generation ```python # pip install -q transformers from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigcode/octocoder" device = "cuda" # for GPU usage or "cpu" for CPU usage tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device) outputs = model.generate(inputs) print(tokenizer.decode(outputs[0])) ``` # Training ## Model - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective - **Steps:** 250k pretraining & 30 instruction tuning - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning - **Precision:** bfloat16 ## Hardware - **Pretraining:** - **GPUs:** 512 Tesla A100 - **Training time:** 24 days - **Instruction tuning:** - **GPUs:** 8 Tesla A100 - **Training time:** 4 hours ## Software - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training) - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) # Citation TODO