Voxx V0

by Project FATE

Also known as the most incompetent model of the Modern Era.

Overview

Voxx V0 is a decoder-only transformer architecture. The implementation includes:

Token and positional embeddings
Multi-Head Self-Attention
Feed Forward Networks
Residual Connections
Layer Normalization
Autoregressive text generation
Top-k sampling
Temperature scaling
Model checkpointing using SafeTensors

Model Architecture

Component	Value
Architecture	Decoder-Only Transformer
Parameters	~16 Million
Layers	12
Attention Heads	12
Embedding Dimension	768
Context Length	256 Tokens
Vocabulary Size	50,257
Dropout	0.1
Framework	PyTorch

Training Data

The model was trained on a collection of literary and public-domain text sources including:

Project Gutenberg Corpus

The objective was not large-scale pretraining but to understand how language models learn representations, syntax, structure, and next-token prediction from raw text.

Tokenization

Voxx V0 uses the GPT-2 Byte Pair Encoding (BPE) tokenizer.

Vocabulary Size: 50,257
Encoding Scheme: GPT-2 BPE
Tokenizer Library: tiktoken

Training Setup

Component	Value
Optimizer	AdamW
Framework	PyTorch
Precision	FP32
Checkpoint Format	SafeTensors
Training Hardware	NVIDIA T4 / RTX 2060
Context Window	256 Tokens

Text Generation

The model supports:

Greedy Decoding
Temperature Sampling
Top-k Sampling

Example:

output = model.generate(
    idx,
    max_new_tokens=100,
    temperature=0.8,
    top_k=40
)

Repository Structure

.
├── README.md
├── model.py
├── sample_generation.py
├── requirements.txt
│
├── model/
    ├── config.json
    └── model.safetensors

Example Usage

from model import VoxxModel

model = VoxxModel(cfg)

output = model.generate(
    idx,
    max_new_tokens=100,
    temperature=0.8,
    top_k=40
)

License

This project is released under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

swyzad
/

Voxx_v0