kurumuz commited on
Commit
bf9899b
1 Parent(s): df8b1f3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - pytorch
6
+ - causal-lm
7
+ license: apache-2.0
8
+ datasets:
9
+ - the Pile
10
+ ---
11
+
12
+ # Genji-python 6B
13
+
14
+ ## Model Description
15
+
16
+ Genji is a transformer model finetuned on EleutherAI's GPT-J 6B model. This particular model is trained on python only code approaching 4GB in size.
17
+
18
+ | Hyperparameter | Value |
19
+ |-------------------|--------|
20
+ | n_parameters | 6,053,381,344 |
21
+ | n_layers | 28* |
22
+ | d_model | 4,096 |
23
+ | d_ff | 16,384 |
24
+ | n_heads | 16 |
25
+ | d_head | 256 |
26
+ | n_ctx | 2,048 |
27
+ | n_vocab | 50,400 (same tokenizer as GPT-2/3) |
28
+ | position encoding | [Rotary position encodings (RoPE)](https://arxiv.org/abs/2104.09864) |
29
+ | RoPE dimensions | [64](https://github.com/kingoflolz/mesh-transformer-jax/blob/f2aa66e0925de6593dcbb70e72399b97b4130482/mesh_transformer/layers.py#L223) |
30
+
31
+ `*` each layer consists of one feedforward block and one self attention block
32
+
33
+ The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
34
+ dimension is split into 16 heads, each with a dimension of 256. Rotary position encodings (RoPE) was applied to 64
35
+ dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
36
+ GPT-2/GPT-3.
37
+
38
+ ## Training data
39
+
40
+ GPT-J 6B was pretrained on the [Pile](pile.eleuther.ai), a large scale curated dataset created by EleutherAI for the purpose of training this model. After the pre-training, it's finetuned on the python code that was taken from the Pile.
41
+
42
+ ## Training procedure
43
+
44
+ Genji-python-6B is trained for 20k steps on around 655 million tokens. For LR we used a warmup steps of 2000 and annual decrease of learning rate from 5e-06 to 1e-06.
45
+
46
+ ## Intended Use
47
+
48
+ This model is trained for assistence on writing python code and having fun trying weird stuff with it.
49
+
50
+ ### How to use
51
+
52
+ This model is only usable with our fork because GPT-J is not merged to the main transformers repo yet. When it's merged, we will make this model easily loadable.
53
+ For now, you need to use this fork:
54
+ [Fork](https://github.com/finetuneanon/transformers)
55
+
56
+ to install with pip:
57
+ ```bash
58
+ pip install git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3-rp-b
59
+ ```
60
+
61
+ This model takes more than 16 gigs of RAM to load. If you want more efficient and faster loading, please check our split model.
62
+ We recommend the usage of the model as FP16. That way, it fits in 16GB VRAM cards.
63
+
64
+ How to use:
65
+ ```python
66
+ from transformers import (
67
+ AutoTokenizer,
68
+ AutoModelForCausalLM,
69
+ GPTNeoForCausalLM,
70
+ )
71
+
72
+ model = AutoModelForCausalLM.from_pretrained("NovelAI/genji-python-6B", use_auth_token=True).half().eval().cuda()
73
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
74
+
75
+ text = '''def print_customer_name'''
76
+
77
+ tokens = tokenizer(text, return_tensors="pt").input_ids
78
+ generated_tokens = model.generate(tokens.long().cuda(), use_cache=True, do_sample=True, top_k=50, temperature=0.3, top_p=0.9, repetition_penalty=1.125, min_length=1, max_length=len(tokens[0]) + 400, pad_token_id=tokenizer.eos_token_id)
79
+ last_tokens = generated_tokens[0][len(tokens[0]):]
80
+ generated_text = tokenizer.decode(last_tokens)
81
+ print("Generation:\n" + generated_text)
82
+ ```
83
+ When ran, this code generates:
84
+ ```python
85
+ Prompt:
86
+ def print_customer_name
87
+ Generation:
88
+ (self, customer):
89
+ """Print the name of a customer."""
90
+ if not self.is_valid():
91
+ return
92
+
93
+ print("Customer: {}".format(customer))
94
+ ```
95
+
96
+ For example usage, you can see our colab notebook as well:
97
+ [Notebook]()
98
+
99
+ ## Eval results
100
+
101
+ TBD
102
+
103
+ ## Acknowledgements
104
+
105
+ This project was possible because of the compute provided by the
106
+ [TPU Research Cloud](https://sites.research.google/trc/)
107
+
108
+ and [EleutherAI](https://eleuther.ai/) for pretraining of the GPT-J 6B.
109
+
110
+ Thanks to everyone who contributed to this project!
111
+
112
+ [Aero](https://github.com/AeroScripts)
113
+
114
+ [Finetune](https://github.com/finetuneanon)
115
+
116
+ [Kurumuz](https://github.com/kurumuz)