PEFT
PyTorch
gpt_bigcode
code
terryyz commited on
Commit
8f655a0
1 Parent(s): be62e83

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +0 -134
README.md CHANGED
@@ -1,134 +0,0 @@
1
- ---
2
- license: bigcode-openrail-m
3
- datasets:
4
- - bigcode/guanaco-commits
5
- metrics:
6
- - code_eval
7
- library_name: peft
8
- tags:
9
- - code
10
- ---
11
- # Astraios: A Recipe for Parameter-Efficient Instruction Tuning Code Language Models
12
- <p align="center" width="100%">
13
- <a ><img src="https://github.com/bigcode-project/astraios/blob/main/visuals/banner.png?raw=true" alt="Astraios" style="width: 20%; min-width: 300px; display: block; margin: auto;"></a>
14
- </p>
15
-
16
- # Table of Contents
17
-
18
- 1. [Model Summary](#model-summary)
19
- 2. [Use](#use)
20
- 3. [Training](#training)
21
- 4. [Citation](#citation)
22
-
23
- # Model Summary
24
-
25
- > Astraios-1B-FFT is an instruction tuned model with 15.5B parameters created by finetuning StarCoderBase on CommitPackFT & OASST as described in the Astraios paper.
26
-
27
- - **Repository:** [bigcode-project/astraios](https://github.com/bigcode-project/astraios)
28
- - **Paper:** [Astraios: A Recipe for Parameter Efficient Instruction Tuning Code Language Models]()
29
- - **Languages:** 80+ Programming languages
30
- - **✨Astraios:**
31
- <table>
32
- <tr>
33
- <th>Data</t>
34
- <td><a href=https://huggingface.co/datasets/bigcode/guanaco-commits>CommitPackFT+OASST</a></td>
35
- <td>Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions</td>
36
- </tr>
37
- <tr>
38
- <th>Model</t>
39
- <td><a href=https://huggingface.co/collections/bigcode/astraios-1b-6576ff1b8e449026ae327c1c>Astraios-1B</a></td>
40
- <td>Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
41
- </tr>
42
- <tr>
43
- <th></t>
44
- <td><a href=https://huggingface.co/collections/bigcode/astraios-3b-6577127317ee44ff547252d3>Astraios-3B</a></td>
45
- <td>Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
46
- </tr>
47
- <tr>
48
- <th></t>
49
- <td><a href=https://huggingface.co/collections/starpeft/starcoderbase-7b-650c1f028b45cfec8e72c265>Astraios-7B</a></td>
50
- <td>Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
51
- </tr>
52
- <tr>
53
- <th></t>
54
- <td><a href=https://huggingface.co/collections/bigcode/astraios-16b-65788b7476b6de79781054cc>Astraios-16B</a></td>
55
- <td>Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods</td>
56
- </tr>
57
- <tr>
58
- <th>Evaluation</t>
59
- <td><a href=https://huggingface.co/datasets/code_x_glue_cc_clone_detection_big_clone_bench>BigCloneBench</a></td>
60
- <td>Dataset for clone detection; We use 2,000 samples for evaluation</td>
61
- </tr>
62
- <tr>
63
- <th></t>
64
- <td><a href=https://huggingface.co/datasets/code_x_glue_cc_defect_detection>Devign</a></td>
65
- <td>Dataset for defect detection; We use 2,000 samples for evaluation</td>
66
- </tr>
67
- <tr>
68
- <th></t>
69
- <td><a href=https://huggingface.co/datasets/bigcode/humanevalpack>HumanEvalPack</a></td>
70
- <td>Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages</td>
71
- </tr>
72
- <tr>
73
- <th></t>
74
- <td><a href=https://huggingface.co/datasets/RaymondLi/perturbed_humaneval>ReCode</a></td>
75
- <td>Dataset for the robustness of code generation, covering 4 variants</td>
76
- </tr>
77
- <tr>
78
- <th></t>
79
- <td><a href=https://huggingface.co/datasets/moyix/asleep_keyboard>Asleep At The Keyboard</a></td>
80
- <td>Datasets for security of code generation; We use DoW for evaluation</td>
81
- </tr>
82
- </table>
83
-
84
-
85
- # Use
86
-
87
- ## Intended use
88
-
89
- The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
90
-
91
- **Feel free to share your generations in the Community tab!**
92
-
93
- ## Generation
94
- ```python
95
- # pip install -q transformers
96
- from transformers import AutoModelForCausalLM, AutoTokenizer
97
-
98
- checkpoint = "bigcode/astraios-1b-fft"
99
- model = AutoModelForCausalLM.from_pretrained(checkpoint)
100
- device = "cuda" # for GPU usage or "cpu" for CPU usage
101
-
102
- tokenizer = AutoTokenizer.from_pretrained(checkpoint)
103
- model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
104
-
105
- inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
106
- outputs = model.generate(inputs)
107
- print(tokenizer.decode(outputs[0]))
108
- ```
109
-
110
- # Training
111
-
112
- ## Model
113
-
114
- - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
115
- - **Steps:** 250k pretraining & 200 instruction tuning
116
- - **Precision:** fp32
117
-
118
- ## Hardware
119
-
120
- - **Pretraining:**
121
- - **GPUs:** 512 Tesla A100
122
- - **Training time:** 24 days
123
- - **Instruction tuning:**
124
- - **GPUs:** 8 Tesla A100
125
-
126
- ## Software
127
-
128
- - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
129
- - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
130
-
131
- # Citation
132
-
133
- ```bibtex
134
- ```