lumenspark / README.md

Update README.md

bd7d834 verified 20 days ago

4.79 kB

	---
	license: mit
	datasets:
	- allenai/c4
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	base_model:
	- anto18671/lumenspark
	---
	# Linformer-based Language Model

	Efficient language modeling optimized for long sequences using the Linformer architecture. This model reduces memory and computational overhead, making it ideal for various text generation tasks.

	## Table of Contents
	- [Introduction](#introduction)
	- [Architecture](#architecture)
	- [Installation](#installation)
	- [Quick Start](#quick-start)
	- [Inference Parameters](#inference-parameters)
	- [Hyperparameters](#hyperparameters)
	- [Training Progress](#training-progress)
	- [Sponsorship](#sponsorship)
	- [License](#license)

	## Introduction
	The Linformer-based Language Model leverages the Linformer architecture to efficiently handle long sequences in text generation and other language tasks. By optimizing the self-attention mechanism, this model maintains high performance while reducing resource consumption, making it suitable for applications like text completion and generation.

	## Architecture
	Built upon the Linformer Transformer, the model incorporates several key innovations:

	1. Efficient Attention: Reduces self-attention complexity from quadratic to linear by projecting the attention matrix into a lower-dimensional space.
	2. Low-Rank Linear Projections: Utilizes LowRankLinear layers to decrease dimensionality without compromising expressiveness.
	3. Self-Attention Mechanism: Implements multi-head self-attention with full expressivity by avoiding low-rank projections in this module.
	4. Factorized Feed-Forward Layers: Uses factorized LowRankLinear layers in the Feed-Forward Neural Network to maintain performance with fewer parameters.
	5. PreNorm with LayerNorm and LayerScale: Applies Layer Normalization before attention and feed-forward layers, enhanced with LayerScale for better gradient flow and stability.
	6. Dropout & Residual Connections: Incorporates dropout for regularization and residual connections to aid in gradient flow and prevent vanishing gradients.

	## Installation
	Install the `lumenspark` package via pip:

	```bash
	pip install lumenspark
	```

	This command installs the Linformer-based language model along with all necessary dependencies.

	## Training Progress
	Below is the training loss plot that shows the progress made during the model training process:

	![Training Loss Plot](assets/training_loss_plot.png)

	## Quick Start
	Load the pre-trained model and tokenizer from Hugging Face to perform text generation:

	```python
	from lumenspark import LumensparkModel
	import torch

	# 1. Set up the device (GPU if available, else CPU)
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	print(f"Using device: {device}")

	# 2. Load the model and move it to the device
	model = LumensparkModel.from_pretrained("anto18671/lumenspark").to(device)

	# 3. Example input text
	input_text = "Once upon a time"

	# 4. Generate text
	output_text = model.generate(
	input_text,
	max_length=100, # Maximum length of the generated sequence
	temperature=0.7, # Controls randomness in predictions
	top_k=50, # Top-k sampling to filter high-probability tokens
	top_p=0.9, # Nucleus sampling to control diversity
	repetition_penalty=1.2 # Penalize repetition
	)

	# 5. Print the generated text
	print(output_text)
	```

	## Inference Parameters
	Customize text generation using the following parameters:

	- `max_length`: Maximum length of the generated sequence.
	- `temperature`: Controls randomness (lower = more deterministic).
	- `top_k`: Limits sampling to top `k` tokens.
	- `top_p`: Nucleus sampling based on cumulative probability `p`.
	- `repetition_penalty`: Penalizes repeated tokens or phrases.
	- `no_repeat_ngram_size`: Prevents repeated n-grams of specified size.

	## Hyperparameters
	Optimized for performance and efficiency:

	- `vocab_size`: 50,257
	- `embed_dim`: 768
	- `depth`: 8 layers
	- `heads`: 8 attention heads
	- `seq_length`: 768 tokens
	- `dropout`: 1/17
	- `k`: 384 (attention projection)
	- `rank`: 256 (low-rank projections)

	## Acknowledgements

	We would like to extend our gratitude to [RunPod](https://www.runpod.io) for their generous sponsorship, supporting the training and development of Lumenspark. Their contribution has been instrumental in pushing the project forward.

	![RunPod Logo](assets/RunPod.webp)

	## Sponsorship
	Support the ongoing development of Lumenspark!

	### How to Sponsor
	Visit [GitHub Sponsors](https://github.com/sponsors/anto18671) and choose a sponsorship tier that suits you. Thank you for your support!

	## License
	This project is licensed under the [MIT License](LICENSE).