bigcode
/

octocoder

+---
+pipeline_tag: text-generation
+inference: true
+widget:
+- text: 'def print_hello_world():'
+  example_title: Hello world
+  group: Python
+license: bigcode-openrail-m
+datasets:
+- bigcode/commitpackft
+- Muennighoff/oasst-octopack
+metrics:
+- code_eval
+library_name: transformers
+tags:
+- code
+model-index:
+- name: OctoCoder
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      type: bigcode/humanevalpack
+      name: HumanEvalSynthesize Python
+    metrics:
+    - name: pass@1
+      type: pass@1
+      value: 46.2
+      verified: false
+  - task:
+      type: text-generation
+    dataset:
+      type: bigcode/humanevalpack
+      name: HumanEvalSynthesize JavaScript
+    metrics:
+    - name: pass@1
+      type: pass@1
+      value: 39.2
+      verified: false
+---
+![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)
+# OctoCoder
+Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).
+##  Table of Contents
+1. [Model Summary](##model-summary)
+2. [Use](##use)
+3. [Limitations](##limitations)
+4. [Training](##training)
+5. [License](##license)
+6. [Citation](##citation)
+## Model Summary
+OctoCoder is ...
+- **Repository:** [bigcode/octopack](https://github.com/bigcode-project/octopack)
+- **Paper:** [TODO]()
+- **Languages:** 80+ Programming languages
+## Use
+### Intended use
+The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
+**Feel free to share your generations in the Community tab!**
+### Generation
+```python
+# pip install -q transformers
+from transformers import AutoModelForCausalLM, AutoTokenizer
+checkpoint = "bigcode/octocoder"
+device = "cuda" # for GPU usage or "cpu" for CPU usage
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
+inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
+outputs = model.generate(inputs)
+print(tokenizer.decode(outputs[0]))
+```
+# Training
+## Model
+- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
+- **Steps:** 250k pretraining & TODO instruction tuning
+- **Pretraining tokens:** 1 trillion pretraining & TODO instruction tuning
+- **Precision:** bfloat16
+## Hardware
+- **Pretraining:**
+  - **GPUs:** 512 Tesla A100
+  - **Training time:** 24 days
+- **Instruction tuning:**
+  - **GPUs:** TODO Tesla A100
+  - **Training time:** TODO days
+## Software
+- **Orchestration:** [Megatron-LM](https://github.com/bigcode-project/Megatron-LM) & TODO
+- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
+# Citation
+TODO