wassemgtk commited on
Commit
286bc5c
1 Parent(s): 86b46ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -82
README.md CHANGED
@@ -1,15 +1,23 @@
1
  ---
2
  language:
3
  - en
 
 
4
  tags:
5
  - text generation
6
  - pytorch
7
  - causal-lm
8
- license: cc-by-4.0
 
 
9
  pipeline_tag: text-generation
10
  library_name: transformers
11
  ---
12
- # Writer-5B
 
 
 
 
13
 
14
  <style>
15
  img {
@@ -17,115 +25,82 @@ img {
17
  }
18
  </style>
19
 
20
- |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-5B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
21
 
22
 
23
  ## Model Description
24
 
25
- Model description
26
- Writer LLM base was primarily pretrained with English text, there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Writer LLM base is a member of the same family of models that only contain a decoder. As a result, it was pretrained utilizing the objective of self-supervised causal language modeling.
27
- Writer LLM base uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation in accordance with GPT-3. Read the official paper if you want more information about this.
28
 
29
- ## Getting started
30
 
31
- ### Step 1: Install NeMo and dependencies
32
 
33
- You will need to install NVIDIA Apex and NeMo.
34
 
35
- ```
36
- git clone https://github.com/ericharper/apex.git
37
- cd apex
38
- git checkout nm_v1.11.0
39
- pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
40
- ```
41
 
42
- ```
43
- pip install nemo_toolkit['nlp']==1.11.0
44
- ```
45
 
46
- ### Step 2: Launch eval server
47
 
48
- **Note.** The example below launches a model variant with Tensor Parallelism (TP) of 4 and Pipeline Parallelism (PP) of 1 on two GPUs.
49
 
 
50
 
51
- ```
52
- git clone https://github.com/NVIDIA/NeMo.git
53
- cd NeMo/examples/nlp/language_modeling
54
- git checkout v1.11.0
55
- python megatron_gpt_eval.py gpt_model_file=palmyara_gpt_5b.nemo server=True tensor_model_parallel_size=4 trainer.devices=4
56
- ```
57
 
58
- ### Step 3: Send prompts to your model!
59
- ```python
60
- import json
61
- import requests
62
-
63
- port_num = 5555
64
- headers = {"Content-Type": "application/json"}
65
-
66
- def request_data(data):
67
- resp = requests.put('http://localhost:{}/generate'.format(port_num),
68
- data=json.dumps(data),
69
- headers=headers)
70
- sentences = resp.json()['sentences']
71
- return sentences
72
-
73
-
74
- data = {
75
- "sentences": ["Tell me an interesting fact about space travel."]*1,
76
- "tokens_to_generate": 50,
77
- "temperature": 1.0,
78
- "add_BOS": True,
79
- "top_k": 0,
80
- "top_p": 0.9,
81
- "greedy": False,
82
- "all_probs": False,
83
- "repetition_penalty": 1.2,
84
- "min_tokens_to_generate": 2,
85
- }
86
 
87
- sentences = request_data(data)
88
- print(sentences)
89
- ```
90
 
 
91
 
92
- ## Training Data
93
 
94
- | part | MassiveText (sampling) | tokens (B) | sampling ratio |
95
- |:---------------|-----------------------:|:----------:| --------------:|
96
- | mc4 filtered | MassiveWeb (48%) | 1331 | 58% |
97
- | TrustedWeb | - | - | - |
98
- | realnews | News (10%) | 21 | 10% |
99
- | c4 | c4 (10%) | - | - |
100
- | wikipedia-40B | wikipedia (2%) | 2 | 5% |
101
- | github | github (3%) | - | - |
102
- | books | books (27%) | 24 | 27% |
103
- | youtube | - | - | - |
104
 
 
105
 
106
 
107
- ## Evaluation results
 
 
108
 
109
- *Zero-shot performance.* Evaluated using [LM Evaluation Test Suite from AI21](https://github.com/AI21Labs/lm-evaluation)
110
 
111
- | ARC-Challenge | ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
112
- | ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
113
- | 0.3976 | 0.5566 | 0.5007 | 0.4171 | 0.6133 | 0.5812 | 0.6356 | 0.6298 | 0.7492 |
114
 
115
- ## Limitations
116
 
117
- The model was trained on the data originally crawled from the Internet. This data contains toxic language and societal biases. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
118
 
119
- ## References
120
 
121
- [1] [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)
 
 
 
 
 
 
 
 
 
 
122
 
123
- [2] [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/pdf/1909.08053.pdf)
124
 
125
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
126
 
127
- [4] [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)
128
 
129
- ## Licence
 
 
 
 
 
 
 
 
 
130
 
131
- License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
 
1
  ---
2
  language:
3
  - en
4
+ datasets:
5
+ - English
6
  tags:
7
  - text generation
8
  - pytorch
9
  - causal-lm
10
+ - Writer-data
11
+ - GPT
12
+ - NeMO
13
  pipeline_tag: text-generation
14
  library_name: transformers
15
  ---
16
+
17
+ license: cc-by-4.0
18
+
19
+
20
+ # Palmyra-small
21
 
22
  <style>
23
  img {
 
25
  }
26
  </style>
27
 
28
+ |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-126M-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
29
 
30
 
31
  ## Model Description
32
 
33
+ Palmyra was primarily pretrained with English text, there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra is a member of the same family of models that only contain a decoder. As a result, it was pretrained utilizing the objective of self-supervised causal language modeling. Palmyra uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation in accordance with GPT-3. Read the official paper if you want more information about this.
 
 
34
 
 
35
 
36
+ ## Training data
37
 
38
+ Palmyra-base 5b was trained on Writer custom dataset
39
 
40
+ ## Intended Use and Limitations
 
 
 
 
 
41
 
42
+ Palmyra-base learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt.
 
 
43
 
44
+ ### How to use
45
 
46
+ This model can be easily loaded using the `AutoModelForCausalLM` functionality:
47
 
48
+ ```python
49
 
50
+ from transformers import AutoModelForCausalLM, AutoTokenizer
51
+ import torch
 
 
 
 
52
 
53
+ model = AutoModelForCausalLM.from_pretrained("Writer/palmyra-base", torch_dtype=torch.float16).cuda()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
+ # the fast tokenizer currently does not work correctly
56
+ tokenizer = AutoTokenizer.from_pretrained("Writer/palmyra-base", use_fast=False)
 
57
 
58
+ prompt = "What is the color of a carrot?\nA:"
59
 
60
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
61
 
62
+ generated_ids = model.generate(input_ids)
 
 
 
 
 
 
 
 
 
63
 
64
+ tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
65
 
66
 
67
+ ```
68
+
69
+ ### Limitations and Biases
70
 
71
+ Palmyra's core functionality is to take a string of text and predict the next token. While language models are widely used for other tasks, there are many unknowns in this work. When prompting Palmyra, keep in mind that the statistically most likely next token is not always the token that produces the most "accurate" text. Never rely on Palmyra to produce factually correct results.
72
 
73
+ Palmyra was trained on Writer custom data. As with all language models, it is difficult to predict how Palmyra will respond to specific prompts, and offensive content may appear unexpectedly. We recommend that the outputs be curated or filtered by humans before they are released, both to censor undesirable content and to improve the quality of the results.
 
 
74
 
75
+ ## Evaluation results
76
 
77
+ Evaluation of Palmyra-base model on the SuperGLUE benchmark
78
 
 
79
 
80
+ | Task | Metric | Value |
81
+ |------------|--------|-------|
82
+ | boolq | acc | 64.43 |
83
+ | cb | acc | 10.71 |
84
+ | | f1 | 08.32 |
85
+ | copa | acc | 76.00 |
86
+ | multirc | acc | 01.26 |
87
+ | record | f1 | 84.02 |
88
+ | | em | 83.29 |
89
+ | wic | acc | 50.00 |
90
+ | wsc | acc | 36.54 |
91
 
 
92
 
93
+ ## Citation and Related Information
94
 
 
95
 
96
+ To cite this model:
97
+ ```bibtex
98
+ @misc{Palmyra,
99
+ author = {Kiran and Komatsuzaki, Aran},
100
+ title = {{GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model}},
101
+ howpublished = {\url{https://github.com/kingoflolz/mesh-transformer-jax}},
102
+ year = 2021,
103
+ month = May
104
+ }
105
+ ```
106