octocoder / README.md

Added Table 2

3140b3e over 1 year ago

10.1 kB

	---
	pipeline_tag: text-generation
	inference: true
	widget:
	- text: 'def print_hello_world():'
	example_title: Hello world
	group: Python
	license: bigcode-openrail-m
	datasets:
	- bigcode/commitpackft
	- bigcode/oasst-octopack
	metrics:
	- code_eval
	library_name: transformers
	tags:
	- code
	model-index:
	- name: OctoCoder
	results:
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize Python
	metrics:
	- name: pass@1
	type: pass@1
	value: 46.2
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize JavaScript
	metrics:
	- name: pass@1
	type: pass@1
	value: 39.2
	verified: false
	---

	![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)

	# OctoCoder

	Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).
	<style>
	table{
	border-collapse: collapse;
	}
	</style>
	<table>
	<tr>
	<th>Model (↓)</th>
	<th>Python</th>
	<th>JavaScript</th>
	<th>Java</th>
	<th>Go</th>
	<th>C++</th>
	<th>Rust</th>
	<th>Avg.</th>
	</tr>
	</table>
	<hr style="background-color: black;">
	<center><strong>HumanEvalFix</strong></center>
	<hr style="background-color: black;">
	<center>Non-permissive models</center>
	<hr style="background-color: black;">
	<table>
	<tr>
	<td>WizardCoder</td>
	<td>31.8</td>
	<td>29.5</td>
	<td>12.7</td>
	<td>30.4</td>
	<td>18.7</td>
	<td>13.0</td>
	<td>22.7</td>
	</tr>
	<tr>
	<td>GPT-4</td>
	<td>47.0 </td>
	<td>48.2</td>
	<td>50.0</td>
	<td>50.6</td>
	<td>47.6</td>
	<td>43.3</td>
	<td><u>47.8</u></td>
	</tr>
	</table>
	<hr style="background-color: black;">
	<center>Permissive models</center>
	<hr style="background-color: black;">
	<table>
	<tr>
	<td>InstructCodeT5+<sup>‡</sup></td>
	<td>2.7</td>
	<td>1.2</td>
	<td>4.3</td>
	<td>2.1</td>
	<td>0.2</td>
	<td>0.5</td>
	<td>1.8</td>
	</tr>
	<tr>
	<td>BLOOMZ<sup>+</sup></td>
	<td>16.6</td>
	<td>15.5</td>
	<td>15.2</td>
	<td>16.4</td>
	<td>6.7</td>
	<td>5.7</td>
	<td>12.5</td>
	</tr>
	<tr>
	<td>StarChat-β</td>
	<td>18.1</td>
	<td>18.1</td>
	<td>24.1</td>
	<td>18.1</td>
	<td>8.2</td>
	<td>3.6</td>
	<td>11.2</td>
	</tr>
	<tr>
	<td>CodeGeeX2<sup>*</sup></td>
	<td>15.9</td>
	<td>14.7</td>
	<td>18.0</td>
	<td>13.6</td>
	<td>4.3</td>
	<td>6.1</td>
	<td>12.1</td>
	</tr>
	<tr>
	<td>StarCoder</td>
	<td>8.7</td>
	<td>15.7</td>
	<td>13.3</td>
	<td>20.1</td>
	<td>15.6</td>
	<td>6.7</td>
	<td>13.4</td>
	</tr>
	<tr>
	<td>OctoGeeX<sup>*</sup></td>
	<td>28.1</td>
	<td>27.7</td>
	<td>30.4</td>
	<td>27.6</td>
	<td>22.9</td>
	<td>9.6</td>
	<td>24.4</td>
	</tr>
	<tr>
	<td>OctoCoder</td>
	<td><strong>30.2</strong></td>
	<td><strong>28.4</strong></td>
	<td><strong>30.6</strong></td>
	<td><strong>30.2</strong></td>
	<td><strong>26.1</strong></td>
	<td><strong>16.5</strong></td>
	<td><strong>27.0</strong></td>
	</tr>
	</table>
	<hr style="background-color: black;">
	<center><h4>HumanEvalExplain</h4></center>
	<hr style="background-color: black;">
	<center>Non-permissive models</center>
	<hr style="background-color: black;">
	<table>
	<tr>
	<td>WizardCoder</td>
	<td>32.5</td>
	<td>33.0</td>
	<td>27.4</td>
	<td>26.7</td>
	<td>28.2</td>
	<td>16.9</td>
	<td>27.5</td>
	</tr>
	<tr>
	<td>GPT-4</td>
	<td>64.6</td>
	<td>57.3</td>
	<td>51.2</td>
	<td>58.5</td>
	<td>38.4</td>
	<td>42.7</td>
	<td><u>52.1</u></td>
	</tr>
	</table>
	<hr style="background-color: black;">
	<center>Permissive models</center>
	<hr style="background-color: black;">
	<table>
	<tr>
	<td>InstructCodeT5+<sup>‡</sup></td>
	<td>20.8</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.1</td>
	<td>0.0</td>
	<td>3.5</td>
	</tr>
	<tr>
	<td>BLOOMZ<sup>+</sup></td>
	<td>14.7</td>
	<td>8.8</td>
	<td>12.1</td>
	<td>8.5</td>
	<td>0.6</td>
	<td>0.0</td>
	<td>7.5</td>
	</tr>
	<tr>
	<td>StarChat-β</td>
	<td>25.4</td>
	<td>21.5</td>
	<td>24.5</td>
	<td>18.4</td>
	<td>17.6</td>
	<td>13.2</td>
	<td>20.1</td>
	</tr>
	<tr>
	<td>CodeGeeX2<sup>*</sup></td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	</tr>
	<tr>
	<td>StarCoder</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	<td>0.0</td>
	</tr>
	<tr>
	<td>OctoGeeX<sup>*</sup></td>
	<td>30.4</td>
	<td>24.0</td>
	<td>24.7</td>
	<td><strong>21.7</strong></td>
	<td>21.0</td>
	<td><strong>15.9</strong></td>
	<td>22.9</td>
	</tr>
	<tr>
	<td>OctoCoder</td>
	<td><strong>35.1</strong></td>
	<td><strong>24.5</strong></td>
	<td><strong>27.3</strong></td>
	<td>21.1</td>
	<td><strong>24.1</strong></td>
	<td>14.8</td>
	<td><strong>24.5</strong></td>
	</tr>
	</table>
	<hr style="background-color: black;">
	<center><h4>HumanEvalSynthesize</h4></center>
	<hr style="background-color: black;">
	<center>Non-permissive models</center>
	<hr style="background-color: black;">
	<table>
	<tr>
	<td>WizardCoder</td>
	<td>57.3</td>
	<td>49.5</td>
	<td>36.1</td>
	<td>36.4</td>
	<td>40.9</td>
	<td>20.2</td>
	<td>40.1</td>
	</tr>
	<tr>
	<td>GPT-4</td>
	<td>86.6</td>
	<td>82.9</td>
	<td>81.7</td>
	<td>72.6</td>
	<td>78.7</td>
	<td>67.1</td>
	<td><u>78.3</u></td>
	</tr>
	</table>
	<hr style="background-color: black;">
	<center>Permissive models</center>
	<hr style="background-color: black;">
	<table>
	<tr>
	<td>InstructCodeT5+<sup>‡</sup></td>
	<td>37.0</td>
	<td>18.9</td>
	<td>17.4</td>
	<td>9.5</td>
	<td>19.8</td>
	<td>0.3</td>
	<td>17.1</td>
	</tr>
	<tr>
	<td>BLOOMZ<sup>+</sup></td>
	<td>15.6</td>
	<td>14.8</td>
	<td>18.4</td>
	<td>8.4</td>
	<td>6.5</td>
	<td>5.5</td>
	<td>11.5</td>
	</tr>
	<tr>
	<td>StarChat-β</td>
	<td>33.5</td>
	<td>31.4</td>
	<td>26.7</td>
	<td>25.5</td>
	<td>26.6</td>
	<td>14.0</td>
	<td>26.3</td>
	</tr>
	<tr>
	<td>CodeGeeX2<sup>*</sup></td>
	<td>35.9</td>
	<td>32.2</td>
	<td>30.8</td>
	<td>22.5</td>
	<td>29.3</td>
	<td>18.1</td>
	<td>28.1</td>
	</tr>
	<tr>
	<td>StarCoder</td>
	<td>33.6</td>
	<td>30.8</td>
	<td>30.2</td>
	<td>17.6</td>
	<td>31.6</td>
	<td>21.8</td>
	<td>27.6</td>
	</tr>
	<tr>
	<td>OctoGeeX<sup>*</sup></td>
	<td>44.7</td>
	<td>33.8</td>
	<td>36.9</td>
	<td>21.9</td>
	<td>32.3</td>
	<td>15.7</td>
	<td>30.9</td>
	</tr>
	<tr>
	<td>OctoCoder</td>
	<td><strong>46.2</strong></td>
	<td><strong>39.2</strong></td>
	<td><strong>38.2</strong></td>
	<td><strong>30.4</strong></td>
	<td><strong>35.6</strong></td>
	<td><strong>23.4</strong></td>
	<td><strong>35.5</strong></td>
	</tr>
	</table>

	## Table of Contents

	1. [Model Summary](##model-summary)
	2. [Use](##use)
	3. [Limitations](##limitations)
	4. [Training](##training)
	5. [License](##license)
	6. [Citation](##citation)

	## Model Summary

	OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper.

	- Repository: [bigcode/octopack](https://github.com/bigcode-project/octopack)
	- Paper: [TODO]()
	- Languages: 80+ Programming languages

	## Use

	### Intended use

	The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"

	Feel free to share your generations in the Community tab!

	### Generation
	```python
	# pip install -q transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer

	checkpoint = "bigcode/octocoder"
	device = "cuda" # for GPU usage or "cpu" for CPU usage

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

	inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
	outputs = model.generate(inputs)
	print(tokenizer.decode(outputs[0]))
	```

	# Training

	## Model

	- Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
	- Steps: 250k pretraining & 30 instruction tuning
	- Pretraining tokens: 1 trillion pretraining & 2M instruction tuning
	- Precision: bfloat16

	## Hardware

	- Pretraining:
	- GPUs: 512 Tesla A100
	- Training time: 24 days
	- Instruction tuning:
	- GPUs: 8 Tesla A100
	- Training time: 4 hours

	## Software

	- Orchestration: [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
	- Neural networks: [PyTorch](https://github.com/pytorch/pytorch)

	# Citation

	TODO