bigcode
/

octocoder

Text Generation

Inference Endpoints

Model card Files Files and versions Community

octocoder / README.md

Muennighoff's picture

Create README.md

14e6c10 over 1 year ago

|

2.89 kB

	---
	pipeline_tag: text-generation
	inference: true
	widget:
	- text: 'def print_hello_world():'
	example_title: Hello world
	group: Python
	license: bigcode-openrail-m
	datasets:
	- bigcode/commitpackft
	- Muennighoff/oasst-octopack
	metrics:
	- code_eval
	library_name: transformers
	tags:
	- code
	model-index:
	- name: OctoCoder
	results:
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize Python
	metrics:
	- name: pass@1
	type: pass@1
	value: 46.2
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalSynthesize JavaScript
	metrics:
	- name: pass@1
	type: pass@1
	value: 39.2
	verified: false
	---

	![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)

	# OctoCoder

	Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).

	## Table of Contents

	1. [Model Summary](##model-summary)
	2. [Use](##use)
	3. [Limitations](##limitations)
	4. [Training](##training)
	5. [License](##license)
	6. [Citation](##citation)

	## Model Summary

	OctoCoder is ...

	- Repository: [bigcode/octopack](https://github.com/bigcode-project/octopack)
	- Paper: [TODO]()
	- Languages: 80+ Programming languages

	## Use

	### Intended use

	The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"

	Feel free to share your generations in the Community tab!

	### Generation
	```python
	# pip install -q transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer

	checkpoint = "bigcode/octocoder"
	device = "cuda" # for GPU usage or "cpu" for CPU usage

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

	inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
	outputs = model.generate(inputs)
	print(tokenizer.decode(outputs[0]))
	```

	# Training

	## Model

	- Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
	- Steps: 250k pretraining & TODO instruction tuning
	- Pretraining tokens: 1 trillion pretraining & TODO instruction tuning
	- Precision: bfloat16

	## Hardware

	- Pretraining:
	- GPUs: 512 Tesla A100
	- Training time: 24 days
	- Instruction tuning:
	- GPUs: TODO Tesla A100
	- Training time: TODO days

	## Software

	- Orchestration: [Megatron-LM](https://github.com/bigcode-project/Megatron-LM) & TODO
	- Neural networks: [PyTorch](https://github.com/pytorch/pytorch)

	# Citation

	TODO