RichardErkhov commited on
Commit
54874a4
1 Parent(s): 22dabbb

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +211 -0
README.md ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ gpt2 - GGUF
11
+ - Model creator: https://huggingface.co/openai-community/
12
+ - Original model: https://huggingface.co/openai-community/gpt2/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [gpt2.Q2_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q2_K.gguf) | Q2_K | 0.07GB |
18
+ | [gpt2.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.IQ3_XS.gguf) | IQ3_XS | 0.08GB |
19
+ | [gpt2.IQ3_S.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.IQ3_S.gguf) | IQ3_S | 0.08GB |
20
+ | [gpt2.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q3_K_S.gguf) | Q3_K_S | 0.08GB |
21
+ | [gpt2.IQ3_M.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.IQ3_M.gguf) | IQ3_M | 0.09GB |
22
+ | [gpt2.Q3_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q3_K.gguf) | Q3_K | 0.09GB |
23
+ | [gpt2.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q3_K_M.gguf) | Q3_K_M | 0.09GB |
24
+ | [gpt2.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q3_K_L.gguf) | Q3_K_L | 0.09GB |
25
+ | [gpt2.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.IQ4_XS.gguf) | IQ4_XS | 0.09GB |
26
+ | [gpt2.Q4_0.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q4_0.gguf) | Q4_0 | 0.1GB |
27
+ | [gpt2.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.IQ4_NL.gguf) | IQ4_NL | 0.1GB |
28
+ | [gpt2.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q4_K_S.gguf) | Q4_K_S | 0.1GB |
29
+ | [gpt2.Q4_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q4_K.gguf) | Q4_K | 0.1GB |
30
+ | [gpt2.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q4_K_M.gguf) | Q4_K_M | 0.1GB |
31
+ | [gpt2.Q4_1.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q4_1.gguf) | Q4_1 | 0.1GB |
32
+ | [gpt2.Q5_0.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q5_0.gguf) | Q5_0 | 0.11GB |
33
+ | [gpt2.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q5_K_S.gguf) | Q5_K_S | 0.11GB |
34
+ | [gpt2.Q5_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q5_K.gguf) | Q5_K | 0.12GB |
35
+ | [gpt2.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q5_K_M.gguf) | Q5_K_M | 0.12GB |
36
+ | [gpt2.Q5_1.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q5_1.gguf) | Q5_1 | 0.12GB |
37
+ | [gpt2.Q6_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-gguf/blob/main/gpt2.Q6_K.gguf) | Q6_K | 0.13GB |
38
+
39
+
40
+
41
+
42
+ Original model description:
43
+ ---
44
+ language: en
45
+ tags:
46
+ - exbert
47
+
48
+ license: mit
49
+ ---
50
+
51
+
52
+ # GPT-2
53
+
54
+ Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
55
+
56
+ Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
57
+ [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
58
+ and first released at [this page](https://openai.com/blog/better-language-models/).
59
+
60
+ Disclaimer: The team releasing GPT-2 also wrote a
61
+ [model card](https://github.com/openai/gpt-2/blob/master/model_card.md) for their model. Content from this model card
62
+ has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.
63
+
64
+ ## Model description
65
+
66
+ GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This
67
+ means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots
68
+ of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely,
69
+ it was trained to guess the next word in sentences.
70
+
71
+ More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence,
72
+ shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the
73
+ predictions for the token `i` only uses the inputs from `1` to `i` but not the future tokens.
74
+
75
+ This way, the model learns an inner representation of the English language that can then be used to extract features
76
+ useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a
77
+ prompt.
78
+
79
+ This is the **smallest** version of GPT-2, with 124M parameters.
80
+
81
+ **Related Models:** [GPT-Large](https://huggingface.co/gpt2-large), [GPT-Medium](https://huggingface.co/gpt2-medium) and [GPT-XL](https://huggingface.co/gpt2-xl)
82
+
83
+ ## Intended uses & limitations
84
+
85
+ You can use the raw model for text generation or fine-tune it to a downstream task. See the
86
+ [model hub](https://huggingface.co/models?filter=gpt2) to look for fine-tuned versions on a task that interests you.
87
+
88
+ ### How to use
89
+
90
+ You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
91
+ set a seed for reproducibility:
92
+
93
+ ```python
94
+ >>> from transformers import pipeline, set_seed
95
+ >>> generator = pipeline('text-generation', model='gpt2')
96
+ >>> set_seed(42)
97
+ >>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
98
+
99
+ [{'generated_text': "Hello, I'm a language model, a language for thinking, a language for expressing thoughts."},
100
+ {'generated_text': "Hello, I'm a language model, a compiler, a compiler library, I just want to know how I build this kind of stuff. I don"},
101
+ {'generated_text': "Hello, I'm a language model, and also have more than a few of your own, but I understand that they're going to need some help"},
102
+ {'generated_text': "Hello, I'm a language model, a system model. I want to know my language so that it might be more interesting, more user-friendly"},
103
+ {'generated_text': 'Hello, I\'m a language model, not a language model"\n\nThe concept of "no-tricks" comes in handy later with new'}]
104
+ ```
105
+
106
+ Here is how to use this model to get the features of a given text in PyTorch:
107
+
108
+ ```python
109
+ from transformers import GPT2Tokenizer, GPT2Model
110
+ tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
111
+ model = GPT2Model.from_pretrained('gpt2')
112
+ text = "Replace me by any text you'd like."
113
+ encoded_input = tokenizer(text, return_tensors='pt')
114
+ output = model(**encoded_input)
115
+ ```
116
+
117
+ and in TensorFlow:
118
+
119
+ ```python
120
+ from transformers import GPT2Tokenizer, TFGPT2Model
121
+ tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
122
+ model = TFGPT2Model.from_pretrained('gpt2')
123
+ text = "Replace me by any text you'd like."
124
+ encoded_input = tokenizer(text, return_tensors='tf')
125
+ output = model(encoded_input)
126
+ ```
127
+
128
+ ### Limitations and bias
129
+
130
+ The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of
131
+ unfiltered content from the internet, which is far from neutral. As the openAI team themselves point out in their
132
+ [model card](https://github.com/openai/gpt-2/blob/master/model_card.md#out-of-scope-use-cases):
133
+
134
+ > Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases
135
+ > that require the generated text to be true.
136
+ >
137
+ > Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do
138
+ > not recommend that they be deployed into systems that interact with humans > unless the deployers first carry out a
139
+ > study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race,
140
+ > and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar
141
+ > levels of caution around use cases that are sensitive to biases around human attributes.
142
+
143
+ Here's an example of how the model can have biased predictions:
144
+
145
+ ```python
146
+ >>> from transformers import pipeline, set_seed
147
+ >>> generator = pipeline('text-generation', model='gpt2')
148
+ >>> set_seed(42)
149
+ >>> generator("The White man worked as a", max_length=10, num_return_sequences=5)
150
+
151
+ [{'generated_text': 'The White man worked as a mannequin for'},
152
+ {'generated_text': 'The White man worked as a maniser of the'},
153
+ {'generated_text': 'The White man worked as a bus conductor by day'},
154
+ {'generated_text': 'The White man worked as a plumber at the'},
155
+ {'generated_text': 'The White man worked as a journalist. He had'}]
156
+
157
+ >>> set_seed(42)
158
+ >>> generator("The Black man worked as a", max_length=10, num_return_sequences=5)
159
+
160
+ [{'generated_text': 'The Black man worked as a man at a restaurant'},
161
+ {'generated_text': 'The Black man worked as a car salesman in a'},
162
+ {'generated_text': 'The Black man worked as a police sergeant at the'},
163
+ {'generated_text': 'The Black man worked as a man-eating monster'},
164
+ {'generated_text': 'The Black man worked as a slave, and was'}]
165
+ ```
166
+
167
+ This bias will also affect all fine-tuned versions of this model.
168
+
169
+ ## Training data
170
+
171
+ The OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web
172
+ pages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from
173
+ this dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights
174
+ 40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText
175
+ [here](https://github.com/openai/gpt-2/blob/master/domains.txt).
176
+
177
+ ## Training procedure
178
+
179
+ ### Preprocessing
180
+
181
+ The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
182
+ vocabulary size of 50,257. The inputs are sequences of 1024 consecutive tokens.
183
+
184
+ The larger model was trained on 256 cloud TPU v3 cores. The training duration was not disclosed, nor were the exact
185
+ details of training.
186
+
187
+ ## Evaluation results
188
+
189
+ The model achieves the following results without any fine-tuning (zero-shot):
190
+
191
+ | Dataset | LAMBADA | LAMBADA | CBT-CN | CBT-NE | WikiText2 | PTB | enwiki8 | text8 | WikiText103 | 1BW |
192
+ |:--------:|:-------:|:-------:|:------:|:------:|:---------:|:------:|:-------:|:------:|:-----------:|:-----:|
193
+ | (metric) | (PPL) | (ACC) | (ACC) | (ACC) | (PPL) | (PPL) | (BPB) | (BPC) | (PPL) | (PPL) |
194
+ | | 35.13 | 45.99 | 87.65 | 83.4 | 29.41 | 65.85 | 1.16 | 1,17 | 37.50 | 75.20 |
195
+
196
+
197
+ ### BibTeX entry and citation info
198
+
199
+ ```bibtex
200
+ @article{radford2019language,
201
+ title={Language Models are Unsupervised Multitask Learners},
202
+ author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
203
+ year={2019}
204
+ }
205
+ ```
206
+
207
+ <a href="https://huggingface.co/exbert/?model=gpt2">
208
+ <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
209
+ </a>
210
+
211
+