Text Generation
Transformers
PyTorch
Safetensors
code
Eval Results
Inference Endpoints
Muennighoff commited on
Commit
14e6c10
1 Parent(s): bebc92f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: true
4
+ widget:
5
+ - text: 'def print_hello_world():'
6
+ example_title: Hello world
7
+ group: Python
8
+ license: bigcode-openrail-m
9
+ datasets:
10
+ - bigcode/commitpackft
11
+ - Muennighoff/oasst-octopack
12
+ metrics:
13
+ - code_eval
14
+ library_name: transformers
15
+ tags:
16
+ - code
17
+ model-index:
18
+ - name: OctoCoder
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ dataset:
23
+ type: bigcode/humanevalpack
24
+ name: HumanEvalSynthesize Python
25
+ metrics:
26
+ - name: pass@1
27
+ type: pass@1
28
+ value: 46.2
29
+ verified: false
30
+ - task:
31
+ type: text-generation
32
+ dataset:
33
+ type: bigcode/humanevalpack
34
+ name: HumanEvalSynthesize JavaScript
35
+ metrics:
36
+ - name: pass@1
37
+ type: pass@1
38
+ value: 39.2
39
+ verified: false
40
+ ---
41
+
42
+ ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)
43
+
44
+ # OctoCoder
45
+
46
+ Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).
47
+
48
+ ## Table of Contents
49
+
50
+ 1. [Model Summary](##model-summary)
51
+ 2. [Use](##use)
52
+ 3. [Limitations](##limitations)
53
+ 4. [Training](##training)
54
+ 5. [License](##license)
55
+ 6. [Citation](##citation)
56
+
57
+ ## Model Summary
58
+
59
+ OctoCoder is ...
60
+
61
+ - **Repository:** [bigcode/octopack](https://github.com/bigcode-project/octopack)
62
+ - **Paper:** [TODO]()
63
+ - **Languages:** 80+ Programming languages
64
+
65
+ ## Use
66
+
67
+ ### Intended use
68
+
69
+ The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
70
+
71
+ **Feel free to share your generations in the Community tab!**
72
+
73
+ ### Generation
74
+ ```python
75
+ # pip install -q transformers
76
+ from transformers import AutoModelForCausalLM, AutoTokenizer
77
+
78
+ checkpoint = "bigcode/octocoder"
79
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
82
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
83
+
84
+ inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
85
+ outputs = model.generate(inputs)
86
+ print(tokenizer.decode(outputs[0]))
87
+ ```
88
+
89
+ # Training
90
+
91
+ ## Model
92
+
93
+ - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
94
+ - **Steps:** 250k pretraining & TODO instruction tuning
95
+ - **Pretraining tokens:** 1 trillion pretraining & TODO instruction tuning
96
+ - **Precision:** bfloat16
97
+
98
+ ## Hardware
99
+
100
+ - **Pretraining:**
101
+ - **GPUs:** 512 Tesla A100
102
+ - **Training time:** 24 days
103
+ - **Instruction tuning:**
104
+ - **GPUs:** TODO Tesla A100
105
+ - **Training time:** TODO days
106
+
107
+ ## Software
108
+
109
+ - **Orchestration:** [Megatron-LM](https://github.com/bigcode-project/Megatron-LM) & TODO
110
+ - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
111
+
112
+ # Citation
113
+
114
+ TODO