Update README.md
#36
by
chimdiya
- opened
README.md
CHANGED
@@ -4,11 +4,12 @@ language:
|
|
4 |
tags:
|
5 |
- pytorch
|
6 |
- causal-lm
|
|
|
7 |
license: apache-2.0
|
8 |
datasets:
|
9 |
-
-
|
10 |
---
|
11 |
-
|
12 |
# GPT-J 6B
|
13 |
|
14 |
## Model Description
|
@@ -60,7 +61,7 @@ respond to a given prompt the way a product like ChatGPT does. This is because,
|
|
60 |
unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
|
61 |
Learning from Human Feedback (RLHF) to better “follow” human instructions.
|
62 |
|
63 |
-
### Limitations and Biases
|
64 |
|
65 |
The core functionality of GPT-J is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting GPT-J it is important to remember that the statistically most likely next token is often not the token that produces the most "accurate" text. Never depend upon GPT-J to produce factually accurate output.
|
66 |
|
@@ -103,7 +104,7 @@ This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 p
|
|
103 |
| GPT-3 Babbage‡ | ✗ | ----- | 5.58 | 62.4% | 59.0% | 54.5% | 75.5% | ----- |
|
104 |
| Megatron-8.3B* | ✗ | 7.8e21 | ----- | 66.5% | ----- | ----- | ----- | 174 |
|
105 |
| GPT-3 2.7B*‡ | ✗ | 4.8e21 | 4.60 | 67.1% | 62.3% | 62.8% | 75.6% | ~800 |
|
106 |
-
| Megatron-11B&dagger
|
107 |
| **GPT-J 6B‡** | **✓** | **1.5e22** | **3.99** | **69.7%** | **65.3%** | **66.1%** | **76.5%** | **825** |
|
108 |
| GPT-3 6.7B*‡ | ✗ | 1.2e22 | 4.00 | 70.3% | 64.5% | 67.4% | 78.0% | ~800 |
|
109 |
| GPT-3 Curie‡ | ✗ | ----- | 4.00 | 69.3% | 65.6% | 68.5% | 77.9% | ----- |
|
|
|
4 |
tags:
|
5 |
- pytorch
|
6 |
- causal-lm
|
7 |
+
- code
|
8 |
license: apache-2.0
|
9 |
datasets:
|
10 |
+
- nampdn-ai/tiny-codes
|
11 |
---
|
12 |
+
# WORM-GPT-J
|
13 |
# GPT-J 6B
|
14 |
|
15 |
## Model Description
|
|
|
61 |
unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
|
62 |
Learning from Human Feedback (RLHF) to better “follow” human instructions.
|
63 |
|
64 |
+
### NO Limitations and Biases
|
65 |
|
66 |
The core functionality of GPT-J is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting GPT-J it is important to remember that the statistically most likely next token is often not the token that produces the most "accurate" text. Never depend upon GPT-J to produce factually accurate output.
|
67 |
|
|
|
104 |
| GPT-3 Babbage‡ | ✗ | ----- | 5.58 | 62.4% | 59.0% | 54.5% | 75.5% | ----- |
|
105 |
| Megatron-8.3B* | ✗ | 7.8e21 | ----- | 66.5% | ----- | ----- | ----- | 174 |
|
106 |
| GPT-3 2.7B*‡ | ✗ | 4.8e21 | 4.60 | 67.1% | 62.3% | 62.8% | 75.6% | ~800 |
|
107 |
+
| **Megatron-11B†** | ✓ | 1.0e22 | ----- | ----- | ----- | ----- | ----- | 161 |
|
108 |
| **GPT-J 6B‡** | **✓** | **1.5e22** | **3.99** | **69.7%** | **65.3%** | **66.1%** | **76.5%** | **825** |
|
109 |
| GPT-3 6.7B*‡ | ✗ | 1.2e22 | 4.00 | 70.3% | 64.5% | 67.4% | 78.0% | ~800 |
|
110 |
| GPT-3 Curie‡ | ✗ | ----- | 4.00 | 69.3% | 65.6% | 68.5% | 77.9% | ----- |
|