hails commited on
Commit
0370d70
1 Parent(s): 3a40a1c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - code
5
+
6
+
7
+ tags:
8
+ - pytorch
9
+ - causal-lm
10
+ - code-generation
11
+
12
+
13
+ license: apache-2.0
14
+
15
+ ---
16
+
17
+
18
+ # FIM-1.3B
19
+
20
+ ## Model Description
21
+
22
+ FIM-1.3B is the first of a series of large-scale infilling-enabled autoregressive language models trained by CarperAI. FIM-1.3B is the first of these models, and future models (both larger and smaller) trained on greater quantities of code data will be released, potentially with different architectural variations optimized for code.
23
+
24
+ This is a preliminary release of an experimental artifact and should be treated as such.
25
+
26
+
27
+
28
+ ## Model Dimensions
29
+
30
+
31
+ | Hyperparameter | Value |
32
+
33
+
34
+ |----------------------|----------------------------------------------------------------------------------------------------------------------------------------|
35
+
36
+
37
+ | \\(n_{parameters}\\) | 1,331,810,304 |
38
+
39
+
40
+ | \\(n_{layers}\\) | 24 |
41
+
42
+
43
+ | \\(d_{model}\\) | 2,048 |
44
+
45
+
46
+ | \\(d_{ff}\\) | 8,192 |
47
+
48
+
49
+ | \\(n_{heads}\\) | 16 |
50
+
51
+
52
+ | \\(d_{head}\\) | 128 |
53
+
54
+
55
+ | \\(n_{ctx}\\) | 2,048 |
56
+
57
+
58
+ | \\(n_{vocab}\\) | 50256 |
59
+
60
+
61
+ | Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
62
+
63
+
64
+
65
+
66
+
67
+
68
+
69
+ The model consists of 24 transformer layers with a model dimension of 2048, and a feedforward dimension of 8192. The model
70
+
71
+
72
+ dimension is split into 16 heads, each with a dimension of 128. Rotary Position Embedding (RoPE) is used.
73
+
74
+
75
+ The model is trained with the same tokenizer as GPT-NeoX-20b (link here), for a vocabulary of 50254 tokens.
76
+
77
+
78
+ ## Training Data
79
+
80
+ The model was trained on the Pile, an 800Gb dataset composed of varied web corpora. The datasheet and paper for the Pile can be found [here] and [here] respectively
81
+
82
+
83
+ ## Training Details
84
+
85
+ This model was trained for 47,000 steps at a batch size of 6,291,456 tokens per step in the [GPT-NeoX codebase](https://github.com/EleutherAI/gpt-neox). It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
86
+
87
+ Following Bavarian et al. 2022, we train the model to additionally perform infilling via a data transformation applied randomly to 90% of input contexts at train-time.
88
+
89
+ Middle segments “to infill” were selected uniformly at random from contexts at the character level, and these contexts were then reformatted as
90
+
91
+
92
+ <SUF> {last 1/3rd of the context} <PRE> {first 1/3rd of the context} <MID> {middle 1/3rd of the context} <EOD>
93
+
94
+
95
+
96
+
97
+
98
+
99
+ ## How to use
100
+
101
+
102
+ This model can be easily loaded using the `AutoModelForCausalLM` class:
103
+
104
+
105
+ ```python
106
+
107
+
108
+ from transformers import AutoTokenizer, AutoModelForCausalLM
109
+
110
+
111
+ tokenizer = AutoTokenizer.from_pretrained("CarperAI/FIM-1.3B")
112
+ model = AutoModelForCausalLM.from_pretrained("CarperAI/FIM-1.3b")
113
+
114
+
115
+ ```
116
+
117
+ ### Performing Infilling
118
+
119
+ Suppose we have some text that we would like to perform infilling on at a certain “cursor location”.
120
+
121
+ This would have the form {some prelude text here} <INFILLING LOCATION> {some text following cursor}.
122
+
123
+ The way to perform infilling generation would be via placing the input text into this format:
124
+
125
+ <SUF> {some text following cursor} <PRE> {some prelude text here} <MID> ... language model output is generated after <MID> token!
126
+
127
+
128
+ ## Intended Uses and Limitations
129
+
130
+ FIM-1.3B learns a representation of the English language that can be used to extract features useful for downstream NLP and Code generation tasks. However, the model has solely been trained on a standard next-token-prediction language modeling task on its training data.
131
+
132
+ ## Limitations and Biases
133
+
134
+ FIM-1.3B was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. FIM-1.3B may produce socially unacceptable or otherwise harmful text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile.
135
+
136
+ As with all language models, it is hard to predict in advance how FIM-1.3B will respond to particular prompts, and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results. Code generated by FIM-1.3B should also be checked for security errors by a human before use in production.
137
+
138
+ ## Evaluation results
139
+
140
+ We evaluate our model on a number of standard NLP datasets to verify that our infilling model performs on par with a comparable autoregressive model.
141
+
142
+ We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) developed by EleutherAI.
143
+
144
+
145
+ Report:
146
+ LogiQA, PIQA, SciQ, WSC, Winogrande, ARC_challenge, ARC_easy, lambada
147
+ On FIM-1.3B, the comparable autoregressive model,
148
+
149
+
150
+
151
+
152
+
153
+ We also perform preliminary investigation on code generation and infilling capabilities by testing on HumanEval-Infilling [link to github] [Bavarian et al. 2022]
154
+
155
+
156
+
157
+
158
+