bigcode
/

octocoder

Text Generation

Inference Endpoints

Model card Files Files and versions Community

octocoder / README.md

Muennighoff's picture

Update README.md

b9c92c5 11 months ago

|

raw history blame

No virus

3.03 kB

	---
	pipeline_tag: text-generation
	inference: true
	widget:
	- text: 'def print_hello_world():'
	example_title: Hello world
	group: Python
	license: bigcode-openrail-m
	datasets:
	- bigcode/commitpackft
	- bigcode/oasst-octopack
	metrics:
	- code_eval
	library_name: transformers
	tags:
	- code
	model-index:
	- name: OctoCoder
	results:
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize Python
	metrics:
	- name: pass@1
	type: pass@1
	value: 46.2
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize JavaScript
	metrics:
	- name: pass@1
	type: pass@1
	value: 39.2
	verified: false
	---

	![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)

	# OctoCoder

	Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).

	## Table of Contents

	1. [Model Summary](##model-summary)
	2. [Use](##use)
	3. [Limitations](##limitations)
	4. [Training](##training)
	5. [License](##license)
	6. [Citation](##citation)

	## Model Summary

	OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper.

	- Repository: [bigcode/octopack](https://github.com/bigcode-project/octopack)
	- Paper: [TODO]()
	- Languages: 80+ Programming languages

	## Use

	### Intended use

	The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"

	Feel free to share your generations in the Community tab!

	### Generation
	```python
	# pip install -q transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer

	checkpoint = "bigcode/octocoder"
	device = "cuda" # for GPU usage or "cpu" for CPU usage

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

	inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
	outputs = model.generate(inputs)
	print(tokenizer.decode(outputs[0]))
	```

	# Training

	## Model

	- Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
	- Steps: 250k pretraining & 30 instruction tuning
	- Pretraining tokens: 1 trillion pretraining & 2M instruction tuning
	- Precision: bfloat16

	## Hardware

	- Pretraining:
	- GPUs: 512 Tesla A100
	- Training time: 24 days
	- Instruction tuning:
	- GPUs: 8 Tesla A100
	- Training time: 4 hours

	## Software

	- Orchestration: [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
	- Neural networks: [PyTorch](https://github.com/pytorch/pytorch)

	# Citation

	TODO