Update README.md

#36
by chimdiya - opened
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -4,11 +4,12 @@ language:
4
  tags:
5
  - pytorch
6
  - causal-lm
 
7
  license: apache-2.0
8
  datasets:
9
- - EleutherAI/pile
10
  ---
11
-
12
  # GPT-J 6B
13
 
14
  ## Model Description
@@ -60,7 +61,7 @@ respond to a given prompt the way a product like ChatGPT does. This is because,
60
  unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
61
  Learning from Human Feedback (RLHF) to better “follow” human instructions.
62
 
63
- ### Limitations and Biases
64
 
65
  The core functionality of GPT-J is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting GPT-J it is important to remember that the statistically most likely next token is often not the token that produces the most "accurate" text. Never depend upon GPT-J to produce factually accurate output.
66
 
@@ -103,7 +104,7 @@ This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 p
103
  | GPT-3 Babbage‡ | ✗ | ----- | 5.58 | 62.4% | 59.0% | 54.5% | 75.5% | ----- |
104
  | Megatron-8.3B* | ✗ | 7.8e21 | ----- | 66.5% | ----- | ----- | ----- | 174 |
105
  | GPT-3 2.7B*‡ | ✗ | 4.8e21 | 4.60 | 67.1% | 62.3% | 62.8% | 75.6% | ~800 |
106
- | Megatron-11B† | ✓ | 1.0e22 | ----- | ----- | ----- | ----- | ----- | 161 |
107
  | **GPT-J 6B‡** | **✓** | **1.5e22** | **3.99** | **69.7%** | **65.3%** | **66.1%** | **76.5%** | **825** |
108
  | GPT-3 6.7B*‡ | ✗ | 1.2e22 | 4.00 | 70.3% | 64.5% | 67.4% | 78.0% | ~800 |
109
  | GPT-3 Curie‡ | ✗ | ----- | 4.00 | 69.3% | 65.6% | 68.5% | 77.9% | ----- |
 
4
  tags:
5
  - pytorch
6
  - causal-lm
7
+ - code
8
  license: apache-2.0
9
  datasets:
10
+ - nampdn-ai/tiny-codes
11
  ---
12
+ # WORM-GPT-J
13
  # GPT-J 6B
14
 
15
  ## Model Description
 
61
  unlike this model, ChatGPT was fine-tuned using methods such as Reinforcement
62
  Learning from Human Feedback (RLHF) to better “follow” human instructions.
63
 
64
+ ### NO Limitations and Biases
65
 
66
  The core functionality of GPT-J is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting GPT-J it is important to remember that the statistically most likely next token is often not the token that produces the most "accurate" text. Never depend upon GPT-J to produce factually accurate output.
67
 
 
104
  | GPT-3 Babbage‡ | ✗ | ----- | 5.58 | 62.4% | 59.0% | 54.5% | 75.5% | ----- |
105
  | Megatron-8.3B* | ✗ | 7.8e21 | ----- | 66.5% | ----- | ----- | ----- | 174 |
106
  | GPT-3 2.7B*‡ | ✗ | 4.8e21 | 4.60 | 67.1% | 62.3% | 62.8% | 75.6% | ~800 |
107
+ | **Megatron-11B†** | ✓ | 1.0e22 | ----- | ----- | ----- | ----- | ----- | 161 |
108
  | **GPT-J 6B‡** | **✓** | **1.5e22** | **3.99** | **69.7%** | **65.3%** | **66.1%** | **76.5%** | **825** |
109
  | GPT-3 6.7B*‡ | ✗ | 1.2e22 | 4.00 | 70.3% | 64.5% | 67.4% | 78.0% | ~800 |
110
  | GPT-3 Curie‡ | ✗ | ----- | 4.00 | 69.3% | 65.6% | 68.5% | 77.9% | ----- |