Upload README.md with huggingface_hub
#1
by
terryyz
- opened
README.md
CHANGED
|
@@ -1,9 +1,143 @@
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: peft
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
|
|
|
| 7 |
|
|
|
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
| 1 |
+
|
| 2 |
---
|
| 3 |
+
license: bigcode-openrail-m
|
| 4 |
+
datasets:
|
| 5 |
+
- bigcode/guanaco-commits
|
| 6 |
+
metrics:
|
| 7 |
+
- code_eval
|
| 8 |
library_name: peft
|
| 9 |
+
tags:
|
| 10 |
+
- code
|
| 11 |
---
|
| 12 |
+
# Astraios: A Recipe for Parameter-Efficient Instruction Tuning Code Language Models
|
| 13 |
+
<p align="center" width="100%">
|
| 14 |
+
<a ><img src="https://github.com/bigcode-project/astraios/blob/main/visuals/banner.png?raw=true" alt="Astraios" style="width: 20%; min-width: 300px; display: block; margin: auto;"></a>
|
| 15 |
+
</p>
|
| 16 |
+
|
| 17 |
+
# Table of Contents
|
| 18 |
+
|
| 19 |
+
1. [Model Summary](#model-summary)
|
| 20 |
+
2. [Use](#use)
|
| 21 |
+
3. [Training](#training)
|
| 22 |
+
4. [Citation](#citation)
|
| 23 |
+
|
| 24 |
+
# Model Summary
|
| 25 |
+
|
| 26 |
+
> Astraios-LoRA is an instruction tuned model with 15.5B parameters created by finetuning StarCoderBase on CommitPackFT & OASST as described in the Astraios paper.
|
| 27 |
+
|
| 28 |
+
- **Repository:** [bigcode-project/astraios](https://github.com/bigcode-project/astraios)
|
| 29 |
+
- **Paper:** [Astraios: A Recipe for Parameter Efficient Instruction Tuning Code Language Models]()
|
| 30 |
+
- **Languages:** 80+ Programming languages
|
| 31 |
+
- **✨Astraios:**
|
| 32 |
+
<table>
|
| 33 |
+
<tr>
|
| 34 |
+
<th>Data</t>
|
| 35 |
+
<td><a href=https://huggingface.co/datasets/bigcode/guanaco-commits>CommitPackFT+OASST</a></td>
|
| 36 |
+
<td>Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions</td>
|
| 37 |
+
</tr>
|
| 38 |
+
<tr>
|
| 39 |
+
<th>Model</t>
|
| 40 |
+
<td><a href=https://huggingface.co/collections/bigcode/astraios-1b-6576ff1b8e449026ae327c1c>Astraios-1B</a></td>
|
| 41 |
+
<td>Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
|
| 42 |
+
</tr>
|
| 43 |
+
<tr>
|
| 44 |
+
<th></t>
|
| 45 |
+
<td><a href=https://huggingface.co/collections/bigcode/astraios-3b-6577127317ee44ff547252d3>Astraios-3B</a></td>
|
| 46 |
+
<td>Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
|
| 47 |
+
</tr>
|
| 48 |
+
<tr>
|
| 49 |
+
<th></t>
|
| 50 |
+
<td><a href=https://huggingface.co/collections/starpeft/starcoderbase-7b-650c1f028b45cfec8e72c265>Astraios-7B</a></td>
|
| 51 |
+
<td>Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
|
| 52 |
+
</tr>
|
| 53 |
+
<tr>
|
| 54 |
+
<th></t>
|
| 55 |
+
<td><a href=https://huggingface.co/collections/bigcode/astraios-16b-65788b7476b6de79781054cc>Astraios-16B</a></td>
|
| 56 |
+
<td>Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
|
| 57 |
+
</tr>
|
| 58 |
+
<tr>
|
| 59 |
+
<th>Evaluation</t>
|
| 60 |
+
<td><a href=https://huggingface.co/datasets/code_x_glue_cc_clone_detection_big_clone_bench>BigCloneBench</a></td>
|
| 61 |
+
<td>Dataset for clone detection; We use 2,000 samples for evaluation</td>
|
| 62 |
+
</tr>
|
| 63 |
+
<tr>
|
| 64 |
+
<th></t>
|
| 65 |
+
<td><a href=https://huggingface.co/datasets/code_x_glue_cc_defect_detection>Devign</a></td>
|
| 66 |
+
<td>Dataset for defect detection; We use 2,000 samples for evaluation</td>
|
| 67 |
+
</tr>
|
| 68 |
+
<tr>
|
| 69 |
+
<th></t>
|
| 70 |
+
<td><a href=https://huggingface.co/datasets/bigcode/humanevalpack>HumanEvalPack</a></td>
|
| 71 |
+
<td>Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages</td>
|
| 72 |
+
</tr>
|
| 73 |
+
<tr>
|
| 74 |
+
<th></t>
|
| 75 |
+
<td><a href=https://huggingface.co/datasets/RaymondLi/perturbed_humaneval>ReCode</a></td>
|
| 76 |
+
<td>Dataset for the robustness of code generation, covering 4 variants</td>
|
| 77 |
+
</tr>
|
| 78 |
+
<tr>
|
| 79 |
+
<th></t>
|
| 80 |
+
<td><a href=https://huggingface.co/datasets/moyix/asleep_keyboard>Asleep At The Keyboard</a></td>
|
| 81 |
+
<td>Datasets for security of code generation; We use DoW for evaluation</td>
|
| 82 |
+
</tr>
|
| 83 |
+
</table>
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
# Use
|
| 87 |
+
|
| 88 |
+
## Intended use
|
| 89 |
+
|
| 90 |
+
The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.
|
| 91 |
+
|
| 92 |
+
Answer:"
|
| 93 |
+
|
| 94 |
+
**Feel free to share your generations in the Community tab!**
|
| 95 |
+
|
| 96 |
+
## Generation
|
| 97 |
+
```python
|
| 98 |
+
# pip install -q transformers
|
| 99 |
+
# pip install -e git+https://github.com/bigcode-project/astraios#subdirectory=peft
|
| 100 |
+
from peft import PeftModel
|
| 101 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 102 |
+
|
| 103 |
+
peft_checkpoint = "bigcode/astraios-lora"
|
| 104 |
+
checkpoint = "bigcode/starcoderbase"
|
| 105 |
+
model = AutoModelForCausalLM.from_pretrained(checkpoint)
|
| 106 |
+
model = PeftModel.from_pretrained(model, peft_checkpoint)
|
| 107 |
+
device = "cuda" # for GPU usage or "cpu" for CPU usage
|
| 108 |
+
|
| 109 |
+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
| 110 |
+
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
|
| 111 |
+
|
| 112 |
+
inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.
|
| 113 |
+
|
| 114 |
+
Answer:", return_tensors="pt").to(device)
|
| 115 |
+
outputs = model.generate(inputs)
|
| 116 |
+
print(tokenizer.decode(outputs[0]))
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
# Training
|
| 120 |
+
|
| 121 |
+
## Model
|
| 122 |
+
|
| 123 |
+
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
| 124 |
+
- **Steps:** 250k pretraining & 200 instruction tuning
|
| 125 |
+
- **Precision:** fp32
|
| 126 |
+
|
| 127 |
+
## Hardware
|
| 128 |
+
|
| 129 |
+
- **Pretraining:**
|
| 130 |
+
- **GPUs:** 512 Tesla A100
|
| 131 |
+
- **Training time:** 24 days
|
| 132 |
+
- **Instruction tuning:**
|
| 133 |
+
- **GPUs:** 8 Tesla A100
|
| 134 |
+
|
| 135 |
+
## Software
|
| 136 |
|
| 137 |
+
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
| 138 |
+
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
| 139 |
|
| 140 |
+
# Citation
|
| 141 |
|
| 142 |
+
```bibtex
|
| 143 |
+
```
|