Voxx V0
by Project FATE
Also known as the most incompetent model of the Modern Era.
Overview
Voxx V0 is a decoder-only transformer architecture. The implementation includes:
- Token and positional embeddings
- Multi-Head Self-Attention
- Feed Forward Networks
- Residual Connections
- Layer Normalization
- Autoregressive text generation
- Top-k sampling
- Temperature scaling
- Model checkpointing using SafeTensors
Model Architecture
| Component | Value |
|---|---|
| Architecture | Decoder-Only Transformer |
| Parameters | ~16 Million |
| Layers | 12 |
| Attention Heads | 12 |
| Embedding Dimension | 768 |
| Context Length | 256 Tokens |
| Vocabulary Size | 50,257 |
| Dropout | 0.1 |
| Framework | PyTorch |
Training Data
The model was trained on a collection of literary and public-domain text sources including:
- Project Gutenberg Corpus
The objective was not large-scale pretraining but to understand how language models learn representations, syntax, structure, and next-token prediction from raw text.
Tokenization
Voxx V0 uses the GPT-2 Byte Pair Encoding (BPE) tokenizer.
- Vocabulary Size: 50,257
- Encoding Scheme: GPT-2 BPE
- Tokenizer Library: tiktoken
Training Setup
| Component | Value |
|---|---|
| Optimizer | AdamW |
| Framework | PyTorch |
| Precision | FP32 |
| Checkpoint Format | SafeTensors |
| Training Hardware | NVIDIA T4 / RTX 2060 |
| Context Window | 256 Tokens |
Text Generation
The model supports:
- Greedy Decoding
- Temperature Sampling
- Top-k Sampling
Example:
output = model.generate(
idx,
max_new_tokens=100,
temperature=0.8,
top_k=40
)
Repository Structure
.
βββ README.md
βββ model.py
βββ sample_generation.py
βββ requirements.txt
β
βββ model/
βββ config.json
βββ model.safetensors
Example Usage
from model import VoxxModel
model = VoxxModel(cfg)
output = model.generate(
idx,
max_new_tokens=100,
temperature=0.8,
top_k=40
)
License
This project is released under the MIT License.