|
--- |
|
license: mit |
|
datasets: |
|
- allenai/c4 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
base_model: |
|
- anto18671/lumenspark |
|
--- |
|
# Linformer-based Language Model |
|
|
|
Efficient language modeling optimized for long sequences using the Linformer architecture. This model reduces memory and computational overhead, making it ideal for various text generation tasks. |
|
|
|
## Table of Contents |
|
- [Introduction](#introduction) |
|
- [Architecture](#architecture) |
|
- [Installation](#installation) |
|
- [Quick Start](#quick-start) |
|
- [Inference Parameters](#inference-parameters) |
|
- [Hyperparameters](#hyperparameters) |
|
- [Training Progress](#training-progress) |
|
- [Sponsorship](#sponsorship) |
|
- [License](#license) |
|
|
|
## Introduction |
|
The **Linformer-based Language Model** leverages the Linformer architecture to efficiently handle long sequences in text generation and other language tasks. By optimizing the self-attention mechanism, this model maintains high performance while reducing resource consumption, making it suitable for applications like text completion and generation. |
|
|
|
## Architecture |
|
Built upon the **Linformer Transformer**, the model incorporates several key innovations: |
|
|
|
1. **Efficient Attention**: Reduces self-attention complexity from quadratic to linear by projecting the attention matrix into a lower-dimensional space. |
|
2. **Low-Rank Linear Projections**: Utilizes LowRankLinear layers to decrease dimensionality without compromising expressiveness. |
|
3. **Self-Attention Mechanism**: Implements multi-head self-attention with full expressivity by avoiding low-rank projections in this module. |
|
4. **Factorized Feed-Forward Layers**: Uses factorized LowRankLinear layers in the Feed-Forward Neural Network to maintain performance with fewer parameters. |
|
5. **PreNorm with LayerNorm and LayerScale**: Applies Layer Normalization before attention and feed-forward layers, enhanced with LayerScale for better gradient flow and stability. |
|
6. **Dropout & Residual Connections**: Incorporates dropout for regularization and residual connections to aid in gradient flow and prevent vanishing gradients. |
|
|
|
## Installation |
|
Install the `lumenspark` package via pip: |
|
|
|
```bash |
|
pip install lumenspark |
|
``` |
|
|
|
This command installs the Linformer-based language model along with all necessary dependencies. |
|
|
|
## Training Progress |
|
Below is the training loss plot that shows the progress made during the model training process: |
|
|
|
![Training Loss Plot](assets/training_loss_plot.png) |
|
|
|
## Quick Start |
|
Load the pre-trained model and tokenizer from Hugging Face to perform text generation: |
|
|
|
```python |
|
from lumenspark import LumensparkModel |
|
import torch |
|
|
|
# 1. Set up the device (GPU if available, else CPU) |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
print(f"Using device: {device}") |
|
|
|
# 2. Load the model and move it to the device |
|
model = LumensparkModel.from_pretrained("anto18671/lumenspark").to(device) |
|
|
|
# 3. Example input text |
|
input_text = "Once upon a time" |
|
|
|
# 4. Generate text |
|
output_text = model.generate( |
|
input_text, |
|
max_length=100, # Maximum length of the generated sequence |
|
temperature=0.7, # Controls randomness in predictions |
|
top_k=50, # Top-k sampling to filter high-probability tokens |
|
top_p=0.9, # Nucleus sampling to control diversity |
|
repetition_penalty=1.2 # Penalize repetition |
|
) |
|
|
|
# 5. Print the generated text |
|
print(output_text) |
|
``` |
|
|
|
## Inference Parameters |
|
Customize text generation using the following parameters: |
|
|
|
- **`max_length`**: Maximum length of the generated sequence. |
|
- **`temperature`**: Controls randomness (lower = more deterministic). |
|
- **`top_k`**: Limits sampling to top `k` tokens. |
|
- **`top_p`**: Nucleus sampling based on cumulative probability `p`. |
|
- **`repetition_penalty`**: Penalizes repeated tokens or phrases. |
|
- **`no_repeat_ngram_size`**: Prevents repeated n-grams of specified size. |
|
|
|
## Hyperparameters |
|
Optimized for performance and efficiency: |
|
|
|
- **`vocab_size`**: 50,257 |
|
- **`embed_dim`**: 768 |
|
- **`depth`**: 8 layers |
|
- **`heads`**: 8 attention heads |
|
- **`seq_length`**: 768 tokens |
|
- **`dropout`**: 1/17 |
|
- **`k`**: 384 (attention projection) |
|
- **`rank`**: 256 (low-rank projections) |
|
|
|
## Acknowledgements |
|
|
|
We would like to extend our gratitude to [RunPod](https://www.runpod.io) for their generous sponsorship, supporting the training and development of Lumenspark. Their contribution has been instrumental in pushing the project forward. |
|
|
|
![RunPod Logo](assets/RunPod.webp) |
|
|
|
## Sponsorship |
|
Support the ongoing development of Lumenspark! |
|
|
|
### How to Sponsor |
|
Visit [GitHub Sponsors](https://github.com/sponsors/anto18671) and choose a sponsorship tier that suits you. Thank you for your support! |
|
|
|
## License |
|
This project is licensed under the [MIT License](LICENSE). |