Commit
·
cb265aa
1
Parent(s):
2eec48f
Update README.md
Browse files
README.md
CHANGED
@@ -5,11 +5,21 @@ license: other
|
|
5 |
|
6 |
This is a text generation model based on the [OPT-1.3B](https://huggingface.co/facebook/opt-1.3b) model from Meta, trained using the Deepspeed library. The model can generate natural and engaging conversational responses given a user input.
|
7 |
|
8 |
-
##
|
9 |
|
10 |
- The base model is [OPT-1.3B](https://huggingface.co/facebook/opt-1.3b), a decoder-only transformer with 1.3 billion parameters, pre-trained on a large text corpus using the causal language modeling objective.
|
11 |
- The model was trained on a single NVIDIA A100 GPU using the Deepspeed pipeline parallelism and ZeRO optimizer.
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
## Usage
|
14 |
|
15 |
You can use this model directly with the Hugging Face pipeline for text generation:
|
|
|
5 |
|
6 |
This is a text generation model based on the [OPT-1.3B](https://huggingface.co/facebook/opt-1.3b) model from Meta, trained using the Deepspeed library. The model can generate natural and engaging conversational responses given a user input.
|
7 |
|
8 |
+
## Training Details
|
9 |
|
10 |
- The base model is [OPT-1.3B](https://huggingface.co/facebook/opt-1.3b), a decoder-only transformer with 1.3 billion parameters, pre-trained on a large text corpus using the causal language modeling objective.
|
11 |
- The model was trained on a single NVIDIA A100 GPU using the Deepspeed pipeline parallelism and ZeRO optimizer.
|
12 |
|
13 |
+
## Model Details
|
14 |
+
- Number of parameters: 1.3 billion
|
15 |
+
- Number of layers: 24
|
16 |
+
- Number of attention heads: 16
|
17 |
+
- Context size: 2048
|
18 |
+
- Vocabulary size: 50,265
|
19 |
+
- Embedding size: 1280
|
20 |
+
- Feed-forward size: 5120
|
21 |
+
- Dropout rate: 0.1
|
22 |
+
|
23 |
## Usage
|
24 |
|
25 |
You can use this model directly with the Hugging Face pipeline for text generation:
|