Gerson Fabian Buenahora Ormaza commited on
Commit
a1d6acc
1 Parent(s): 7707412

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -1
README.md CHANGED
@@ -7,4 +7,57 @@ language:
7
  base_model:
8
  - openai-community/gpt2
9
  pipeline_tag: text-generation
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  base_model:
8
  - openai-community/gpt2
9
  pipeline_tag: text-generation
10
+ ---
11
+
12
+ # ST3: Simple Transformer 3
13
+
14
+ ## Model description
15
+ ST3 (Simple Transformer 3) is a lightweight transformer-based model derived from OpenAI's GPT-2 architecture. It was specifically designed to enable quick fine-tuning and experimentation, making it a great choice for researchers and developers seeking an efficient model for downstream tasks.
16
+
17
+ ### Key features:
18
+ - **Architecture:** GPT-2-based model with 3 attention heads and 3 layers.
19
+ - **Embedding size:** 288 parameters.
20
+ - **Context size:** 2048 tokens, allowing for extended input/output sequences.
21
+ - **Pretrained on:** Wikimedia/Wikipedia subset "20231101.es" (Spanish text corpus).
22
+ - **Parameters:** 4 million FP32 parameters.
23
+ - **Batch size:** 32.
24
+ - **Training environment:** 1 epoch on a Kaggle P100 GPU.
25
+ - **Tokenizer:** Custom WordPiece tokenizer "ST3" with a max input length of 2048 tokens.
26
+
27
+ ## Intended use
28
+ ST3 is not a highly powerful or fully functional model compared to larger transformer models but can be used for:
29
+ - Quick fine-tuning on small datasets.
30
+ - Research purposes to test new ideas.
31
+ - Educational and experimentation purposes.
32
+
33
+ This model has not been fine-tuned or evaluated with performance metrics as it’s not designed for state-of-the-art tasks.
34
+
35
+ ## Limitations
36
+ - **Performance:** ST3 lacks the power of larger models and may not perform well on complex language tasks.
37
+ - **No evaluation:** The model hasn’t been benchmarked with metrics.
38
+ - **Not suitable for production use** without further fine-tuning.
39
+
40
+ ## Training details
41
+ - **Dataset:** Wikimedia/Wikipedia subset "20231101.es".
42
+ - **Number of layers:** 3.
43
+ - **Number of attention heads:** 3.
44
+ - **Embedding size:** 288.
45
+ - **Parameters:** 4 million.
46
+ - **Training:** The model was trained for one epoch with a batch size of 32 on a P100 GPU provided by Kaggle.
47
+
48
+ ## Developer and publisher
49
+ - **Developed by:** BueormAI.
50
+ - **Published by:** BueormLLC.
51
+
52
+ ## Acknowledgments
53
+ Thank you for using ST3! Your feedback and support are appreciated as we continue to develop and improve our models.
54
+
55
+ If you find this model useful and would like to support further development, please consider making a donation to:
56
+
57
+ - [Patreon](https://patreon.com/bueom)
58
+ - [PayPal](https://paypal.me/bueorm)
59
+
60
+ ---
61
+
62
+ *Contributions to this project are always welcome!*
63
+