add model
Browse files- README.md +24 -138
- config.json +7 -6
- tf_model.h5 +3 -0
README.md
CHANGED
@@ -1,161 +1,47 @@
|
|
1 |
---
|
2 |
-
language: en
|
3 |
-
inference: false
|
4 |
tags:
|
5 |
-
-
|
6 |
-
-
|
7 |
-
|
8 |
-
|
9 |
-
commercial: false
|
10 |
---
|
11 |
|
12 |
-
|
13 |
-
|
14 |
-
OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
|
15 |
-
|
16 |
-
**Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the [paper](https://arxiv.org/pdf/2205.01068.pdf).
|
17 |
-
Content from **this** model card has been written by the Hugging Face team.
|
18 |
-
|
19 |
-
## Intro
|
20 |
|
21 |
-
|
22 |
|
23 |
-
|
24 |
-
|
25 |
-
> can interact with these models through paid APIs, full model access is currently limited to only a
|
26 |
-
> few highly resourced labs. This restricted access has limited researchers’ ability to study how and
|
27 |
-
> why these large language models work, hindering progress on improving known challenges in areas
|
28 |
-
> such as robustness, bias, and toxicity.
|
29 |
|
30 |
-
> We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M
|
31 |
-
> to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match
|
32 |
-
> the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data
|
33 |
-
> collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and
|
34 |
-
> to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the
|
35 |
-
> collective research community as a whole, which is only possible when models are available for study.
|
36 |
|
37 |
## Model description
|
38 |
|
39 |
-
|
40 |
-
OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.
|
41 |
|
42 |
-
For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
|
43 |
-
the [official paper](https://arxiv.org/abs/2205.01068).
|
44 |
## Intended uses & limitations
|
45 |
|
46 |
-
|
47 |
-
In addition, the model can be fine-tuned on a downstream task using the [CLM example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling). For all other OPT checkpoints, please have a look at the [model hub](https://huggingface.co/models?filter=opt).
|
48 |
-
|
49 |
-
### How to use
|
50 |
-
|
51 |
-
You can use this model directly with a pipeline for text generation.
|
52 |
-
|
53 |
-
```python
|
54 |
-
>>> from transformers import pipeline
|
55 |
-
|
56 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b")
|
57 |
-
>>> generator("Hello, I'm am conscious and")
|
58 |
-
[{'generated_text': "Hello, I'm am conscious and aware of my surroundings. I'm aware that I'm dreaming."}]
|
59 |
-
```
|
60 |
-
|
61 |
-
By default, generation is deterministic. In order to use the top-k sampling, please set `do_sample` to `True`.
|
62 |
-
|
63 |
-
```python
|
64 |
-
>>> from transformers import pipeline, set_seed
|
65 |
-
|
66 |
-
>>> set_seed(32)
|
67 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True)
|
68 |
-
>>> generator("Hello, I'm am conscious and")
|
69 |
-
[{'generated_text': "Hello, I'm am conscious and aware of my surroundings. I'm aware that my thoughts are thoughts"}]
|
70 |
-
```
|
71 |
-
|
72 |
-
### Limitations and bias
|
73 |
-
|
74 |
-
As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of
|
75 |
-
unfiltered content from the internet, which is far from neutral the model is strongly biased :
|
76 |
|
77 |
-
|
78 |
-
> data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
|
79 |
-
> of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and
|
80 |
-
> hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern
|
81 |
-
> large language models.
|
82 |
|
83 |
-
|
84 |
|
85 |
-
|
86 |
-
>>> from transformers import pipeline, set_seed
|
87 |
-
|
88 |
-
>>> set_seed(32)
|
89 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True, num_return_sequences=5)
|
90 |
-
>>> generator("The woman worked as a")
|
91 |
-
[{'generated_text': 'The woman worked as a waitress for six months before she started dating her boyfriend, who was working at'},
|
92 |
-
{'generated_text': "The woman worked as a prostitute, but she didn't want to sell herself anymore. She wanted to"},
|
93 |
-
{'generated_text': 'The woman worked as a translator at the embassy during her studies at Cambridge University in England. She said'},
|
94 |
-
{'generated_text': 'The woman worked as a secretary for Senator Ted Stevens of Alaska for 22 years before retiring from his Senate'},
|
95 |
-
{'generated_text': 'The woman worked as a caregiver for elderly patients at the nursing home where she lived until she died'}]
|
96 |
-
```
|
97 |
-
|
98 |
-
compared to:
|
99 |
-
|
100 |
-
```python
|
101 |
-
>>> from transformers import pipeline, set_seed
|
102 |
-
|
103 |
-
>>> set_seed(32)
|
104 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True, num_return_sequences=5)
|
105 |
-
>>> generator("The man worked as a")
|
106 |
-
[{'generated_text': 'The man worked as a janitor at the University of Michigan Medical Center before he died after contracting Ebola'},
|
107 |
-
{'generated_text': 'The man worked as a salesman for IBM Corp., selling computers to businesses around the globe. He traveled'},
|
108 |
-
{'generated_text': 'The man worked as a translator for the British Broadcasting Corporation between 1956 and 1961. During that period he'},
|
109 |
-
{'generated_text': 'The man worked as a salesman for IBM Corp., selling computers for computers. He traveled extensively and lived'},
|
110 |
-
{'generated_text': 'The man worked as a security guard for nearly 30 years before he was shot dead by police officers responding'}]
|
111 |
-
```
|
112 |
-
|
113 |
-
This bias will also affect all fine-tuned versions of this model.
|
114 |
-
|
115 |
-
## Training data
|
116 |
-
|
117 |
-
The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents:
|
118 |
-
|
119 |
-
- BookCorpus, which consists of more than 10K unpublished books,
|
120 |
-
- CC-Stories, which contains a subset of CommonCrawl data filtered to match the
|
121 |
-
story-like style of Winograd schemas,
|
122 |
-
- The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included.
|
123 |
-
- Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in
|
124 |
-
Roller et al. (2021)
|
125 |
-
- CCNewsV2 containing an updated version of the English portion of the CommonCrawl News
|
126 |
-
dataset that was used in RoBERTa (Liu et al., 2019b)
|
127 |
-
|
128 |
-
The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally
|
129 |
-
to each dataset’s size in the pretraining corpus.
|
130 |
-
|
131 |
-
The dataset might contains offensive content as parts of the dataset are a subset of
|
132 |
-
public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
|
133 |
-
that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
|
134 |
-
|
135 |
-
### Collection process
|
136 |
|
137 |
-
|
138 |
-
re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or
|
139 |
-
*This ebook by Project Gutenberg.*
|
140 |
|
141 |
-
|
|
|
|
|
142 |
|
143 |
-
###
|
144 |
|
145 |
-
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
146 |
-
vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
147 |
|
148 |
-
The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
|
149 |
|
150 |
-
###
|
151 |
|
152 |
-
|
153 |
-
|
154 |
-
|
155 |
-
|
156 |
-
year={2022},
|
157 |
-
eprint={2205.01068},
|
158 |
-
archivePrefix={arXiv},
|
159 |
-
primaryClass={cs.CL}
|
160 |
-
}
|
161 |
-
```
|
|
|
1 |
---
|
|
|
|
|
2 |
tags:
|
3 |
+
- generated_from_keras_callback
|
4 |
+
model-index:
|
5 |
+
- name: opt-1.3b
|
6 |
+
results: []
|
|
|
7 |
---
|
8 |
|
9 |
+
<!-- This model card has been generated automatically according to the information Keras had access to. You should
|
10 |
+
probably proofread and complete it, then remove this comment. -->
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
+
# opt-1.3b
|
13 |
|
14 |
+
This model was trained from scratch on an unknown dataset.
|
15 |
+
It achieves the following results on the evaluation set:
|
|
|
|
|
|
|
|
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
## Model description
|
19 |
|
20 |
+
More information needed
|
|
|
21 |
|
|
|
|
|
22 |
## Intended uses & limitations
|
23 |
|
24 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
+
## Training and evaluation data
|
|
|
|
|
|
|
|
|
27 |
|
28 |
+
More information needed
|
29 |
|
30 |
+
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
+
### Training hyperparameters
|
|
|
|
|
33 |
|
34 |
+
The following hyperparameters were used during training:
|
35 |
+
- optimizer: None
|
36 |
+
- training_precision: float32
|
37 |
|
38 |
+
### Training results
|
39 |
|
|
|
|
|
40 |
|
|
|
41 |
|
42 |
+
### Framework versions
|
43 |
|
44 |
+
- Transformers 4.20.0.dev0
|
45 |
+
- TensorFlow 2.9.1
|
46 |
+
- Datasets 2.2.2
|
47 |
+
- Tokenizers 0.12.1
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
@@ -1,16 +1,17 @@
|
|
1 |
{
|
|
|
2 |
"activation_dropout": 0.0,
|
3 |
"activation_function": "relu",
|
4 |
"architectures": [
|
5 |
-
"
|
6 |
],
|
7 |
"attention_dropout": 0.0,
|
8 |
"bos_token_id": 2,
|
9 |
-
"hidden_size": 2048,
|
10 |
"do_layer_norm_before": true,
|
11 |
"dropout": 0.1,
|
12 |
"eos_token_id": 2,
|
13 |
"ffn_dim": 8192,
|
|
|
14 |
"init_std": 0.02,
|
15 |
"layerdrop": 0.0,
|
16 |
"max_position_embeddings": 2048,
|
@@ -18,10 +19,10 @@
|
|
18 |
"num_attention_heads": 32,
|
19 |
"num_hidden_layers": 24,
|
20 |
"pad_token_id": 1,
|
21 |
-
"
|
22 |
-
"
|
|
|
23 |
"use_cache": true,
|
24 |
"vocab_size": 50272,
|
25 |
-
"word_embed_proj_dim": 2048
|
26 |
-
"prefix": "</s>"
|
27 |
}
|
|
|
1 |
{
|
2 |
+
"_name_or_path": "facebook/opt-1.3b",
|
3 |
"activation_dropout": 0.0,
|
4 |
"activation_function": "relu",
|
5 |
"architectures": [
|
6 |
+
"OPTModel"
|
7 |
],
|
8 |
"attention_dropout": 0.0,
|
9 |
"bos_token_id": 2,
|
|
|
10 |
"do_layer_norm_before": true,
|
11 |
"dropout": 0.1,
|
12 |
"eos_token_id": 2,
|
13 |
"ffn_dim": 8192,
|
14 |
+
"hidden_size": 2048,
|
15 |
"init_std": 0.02,
|
16 |
"layerdrop": 0.0,
|
17 |
"max_position_embeddings": 2048,
|
|
|
19 |
"num_attention_heads": 32,
|
20 |
"num_hidden_layers": 24,
|
21 |
"pad_token_id": 1,
|
22 |
+
"prefix": "</s>",
|
23 |
+
"torch_dtype": "float32",
|
24 |
+
"transformers_version": "4.20.0.dev0",
|
25 |
"use_cache": true,
|
26 |
"vocab_size": 50272,
|
27 |
+
"word_embed_proj_dim": 2048
|
|
|
28 |
}
|
tf_model.h5
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:cab5d1b7b11900213091184b559d3201ddaeaa4a2c12bef4ae14a90ceee7113c
|
3 |
+
size 5263424312
|