loubnabnl HF staff commited on
Commit
db5608b
1 Parent(s): 90d62ef

update architecture

Browse files
architectures/codegen.txt ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [CodeGen](https://huggingface.co/Salesforce/codegen-16B-mono) architecture follows a standard transformer decoder with left-to-right causal masking. With rotary position embedding for the positional encoding [(Su et al., 2021)](https://arxiv.org/abs/2104.09864), and a context length of 2048. CodeGen models are trained in various sizes.
2
+
3
+ |Model | # parameters |
4
+ | - | - |
5
+ | Decoder | 350M |
6
+ | Decoder | 2.7B |
7
+ | Decoder | 6.1B |
8
+ | Decoder | 16.1B |
9
+
10
+ You can load the model and tokenizer directly from [`transformers`](https://huggingface.co/docs/transformers/index):
11
+
12
+ ```python
13
+ from transformers import AutoTokenizer, AutoModelForCausalLM
14
+
15
+ tokenizer = AutoTokenizer.from_pretrained('Salesforce/codegen-16B-mono')
16
+ model = AutoModelForCausalLM.from_pretrained('Salesforce/codegen-16B-mono')
17
+
18
+ inputs = tokenizer("def hello_world():", return_tensors="pt")
19
+ outputs = model(**inputs)
20
+ ```
architectures/polycoder.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ [PolyCoder](https://github.com/VHellendoorn/Code-LMs) uses GPT2 architecture, with BPE tokenizer trained on a random 5% subset of the data (all languages), and a context mength of 2048. To study the effect of scaling of model size, the odel was trained in 3 different sizes.
2
+
3
+ |Model | # parameters |
4
+ | - | - |
5
+ | GPT2 | 160M |
6
+ | GPT2 | 400M |
7
+ | GPT2 | 2.7B |
8
+
9
+ PolyCoder is currently being integrated in `transformers`. Meanwhile it can be loaded following the instructions in the original Github [repo](https://github.com/vhellendoorn/code-lms#models).