BueormLLC
/

ST3

Text Generation

Safetensors

Spanish

gpt2

Model card Files Files and versions Community

Gerson Fabian Buenahora Ormaza commited on Oct 1, 2024

Commit

a1d6acc

•

1 Parent(s): 7707412

Update README.md

Browse files

Files changed (1) hide show

README.md +54 -1

README.md CHANGED Viewed

@@ -7,4 +7,57 @@ language:
 base_model:
 - openai-community/gpt2
 pipeline_tag: text-generation
----

 base_model:
 - openai-community/gpt2
 pipeline_tag: text-generation
+---
+# ST3: Simple Transformer 3
+## Model description
+ST3 (Simple Transformer 3) is a lightweight transformer-based model derived from OpenAI's GPT-2 architecture. It was specifically designed to enable quick fine-tuning and experimentation, making it a great choice for researchers and developers seeking an efficient model for downstream tasks.
+### Key features:
+- **Architecture:** GPT-2-based model with 3 attention heads and 3 layers.
+- **Embedding size:** 288 parameters.
+- **Context size:** 2048 tokens, allowing for extended input/output sequences.
+- **Pretrained on:** Wikimedia/Wikipedia subset "20231101.es" (Spanish text corpus).
+- **Parameters:** 4 million FP32 parameters.
+- **Batch size:** 32.
+- **Training environment:** 1 epoch on a Kaggle P100 GPU.
+- **Tokenizer:** Custom WordPiece tokenizer "ST3" with a max input length of 2048 tokens.
+## Intended use
+ST3 is not a highly powerful or fully functional model compared to larger transformer models but can be used for:
+- Quick fine-tuning on small datasets.
+- Research purposes to test new ideas.
+- Educational and experimentation purposes.
+This model has not been fine-tuned or evaluated with performance metrics as it’s not designed for state-of-the-art tasks.
+## Limitations
+- **Performance:** ST3 lacks the power of larger models and may not perform well on complex language tasks.
+- **No evaluation:** The model hasn’t been benchmarked with metrics.
+- **Not suitable for production use** without further fine-tuning.
+## Training details
+- **Dataset:** Wikimedia/Wikipedia subset "20231101.es".
+- **Number of layers:** 3.
+- **Number of attention heads:** 3.
+- **Embedding size:** 288.
+- **Parameters:** 4 million.
+- **Training:** The model was trained for one epoch with a batch size of 32 on a P100 GPU provided by Kaggle.
+## Developer and publisher
+- **Developed by:** BueormAI.
+- **Published by:** BueormLLC.
+## Acknowledgments
+Thank you for using ST3! Your feedback and support are appreciated as we continue to develop and improve our models.
+If you find this model useful and would like to support further development, please consider making a donation to:
+- [Patreon](https://patreon.com/bueom)
+- [PayPal](https://paypal.me/bueorm)
+---
+*Contributions to this project are always welcome!*