Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,134 +0,0 @@
|
|
1 |
-
---
|
2 |
-
license: bigcode-openrail-m
|
3 |
-
datasets:
|
4 |
-
- bigcode/guanaco-commits
|
5 |
-
metrics:
|
6 |
-
- code_eval
|
7 |
-
library_name: peft
|
8 |
-
tags:
|
9 |
-
- code
|
10 |
-
---
|
11 |
-
# Astraios: A Recipe for Parameter-Efficient Instruction Tuning Code Language Models
|
12 |
-
<p align="center" width="100%">
|
13 |
-
<a ><img src="https://github.com/bigcode-project/astraios/blob/main/visuals/banner.png?raw=true" alt="Astraios" style="width: 20%; min-width: 300px; display: block; margin: auto;"></a>
|
14 |
-
</p>
|
15 |
-
|
16 |
-
# Table of Contents
|
17 |
-
|
18 |
-
1. [Model Summary](#model-summary)
|
19 |
-
2. [Use](#use)
|
20 |
-
3. [Training](#training)
|
21 |
-
4. [Citation](#citation)
|
22 |
-
|
23 |
-
# Model Summary
|
24 |
-
|
25 |
-
> Astraios-1B-FFT is an instruction tuned model with 15.5B parameters created by finetuning StarCoderBase on CommitPackFT & OASST as described in the Astraios paper.
|
26 |
-
|
27 |
-
- **Repository:** [bigcode-project/astraios](https://github.com/bigcode-project/astraios)
|
28 |
-
- **Paper:** [Astraios: A Recipe for Parameter Efficient Instruction Tuning Code Language Models]()
|
29 |
-
- **Languages:** 80+ Programming languages
|
30 |
-
- **✨Astraios:**
|
31 |
-
<table>
|
32 |
-
<tr>
|
33 |
-
<th>Data</t>
|
34 |
-
<td><a href=https://huggingface.co/datasets/bigcode/guanaco-commits>CommitPackFT+OASST</a></td>
|
35 |
-
<td>Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions</td>
|
36 |
-
</tr>
|
37 |
-
<tr>
|
38 |
-
<th>Model</t>
|
39 |
-
<td><a href=https://huggingface.co/collections/bigcode/astraios-1b-6576ff1b8e449026ae327c1c>Astraios-1B</a></td>
|
40 |
-
<td>Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
|
41 |
-
</tr>
|
42 |
-
<tr>
|
43 |
-
<th></t>
|
44 |
-
<td><a href=https://huggingface.co/collections/bigcode/astraios-3b-6577127317ee44ff547252d3>Astraios-3B</a></td>
|
45 |
-
<td>Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
|
46 |
-
</tr>
|
47 |
-
<tr>
|
48 |
-
<th></t>
|
49 |
-
<td><a href=https://huggingface.co/collections/starpeft/starcoderbase-7b-650c1f028b45cfec8e72c265>Astraios-7B</a></td>
|
50 |
-
<td>Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
|
51 |
-
</tr>
|
52 |
-
<tr>
|
53 |
-
<th></t>
|
54 |
-
<td><a href=https://huggingface.co/collections/bigcode/astraios-16b-65788b7476b6de79781054cc>Astraios-16B</a></td>
|
55 |
-
<td>Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
|
56 |
-
</tr>
|
57 |
-
<tr>
|
58 |
-
<th>Evaluation</t>
|
59 |
-
<td><a href=https://huggingface.co/datasets/code_x_glue_cc_clone_detection_big_clone_bench>BigCloneBench</a></td>
|
60 |
-
<td>Dataset for clone detection; We use 2,000 samples for evaluation</td>
|
61 |
-
</tr>
|
62 |
-
<tr>
|
63 |
-
<th></t>
|
64 |
-
<td><a href=https://huggingface.co/datasets/code_x_glue_cc_defect_detection>Devign</a></td>
|
65 |
-
<td>Dataset for defect detection; We use 2,000 samples for evaluation</td>
|
66 |
-
</tr>
|
67 |
-
<tr>
|
68 |
-
<th></t>
|
69 |
-
<td><a href=https://huggingface.co/datasets/bigcode/humanevalpack>HumanEvalPack</a></td>
|
70 |
-
<td>Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages</td>
|
71 |
-
</tr>
|
72 |
-
<tr>
|
73 |
-
<th></t>
|
74 |
-
<td><a href=https://huggingface.co/datasets/RaymondLi/perturbed_humaneval>ReCode</a></td>
|
75 |
-
<td>Dataset for the robustness of code generation, covering 4 variants</td>
|
76 |
-
</tr>
|
77 |
-
<tr>
|
78 |
-
<th></t>
|
79 |
-
<td><a href=https://huggingface.co/datasets/moyix/asleep_keyboard>Asleep At The Keyboard</a></td>
|
80 |
-
<td>Datasets for security of code generation; We use DoW for evaluation</td>
|
81 |
-
</tr>
|
82 |
-
</table>
|
83 |
-
|
84 |
-
|
85 |
-
# Use
|
86 |
-
|
87 |
-
## Intended use
|
88 |
-
|
89 |
-
The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
|
90 |
-
|
91 |
-
**Feel free to share your generations in the Community tab!**
|
92 |
-
|
93 |
-
## Generation
|
94 |
-
```python
|
95 |
-
# pip install -q transformers
|
96 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
97 |
-
|
98 |
-
checkpoint = "bigcode/astraios-1b-fft"
|
99 |
-
model = AutoModelForCausalLM.from_pretrained(checkpoint)
|
100 |
-
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
101 |
-
|
102 |
-
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
103 |
-
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
|
104 |
-
|
105 |
-
inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
|
106 |
-
outputs = model.generate(inputs)
|
107 |
-
print(tokenizer.decode(outputs[0]))
|
108 |
-
```
|
109 |
-
|
110 |
-
# Training
|
111 |
-
|
112 |
-
## Model
|
113 |
-
|
114 |
-
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
115 |
-
- **Steps:** 250k pretraining & 200 instruction tuning
|
116 |
-
- **Precision:** fp32
|
117 |
-
|
118 |
-
## Hardware
|
119 |
-
|
120 |
-
- **Pretraining:**
|
121 |
-
- **GPUs:** 512 Tesla A100
|
122 |
-
- **Training time:** 24 days
|
123 |
-
- **Instruction tuning:**
|
124 |
-
- **GPUs:** 8 Tesla A100
|
125 |
-
|
126 |
-
## Software
|
127 |
-
|
128 |
-
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
129 |
-
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
130 |
-
|
131 |
-
# Citation
|
132 |
-
|
133 |
-
```bibtex
|
134 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|