YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
gpt-oss-2.5B-A1.3B
A tiny version of unsloth/gpt-oss-20b-BF16 designed for testing and development purposes.
Model Details
- Base Model: unsloth/gpt-oss-20b-BF16
- Architecture: GPT-OSS (Mixture-of-Experts)
- Total Parameters: 2.5B
- Activated Parameters: ~1.3B (4 out of 8 experts active per token)
Architecture Configuration
| Parameter | Original Model | Tiny Model |
|---|---|---|
| Number of Layers | 24 | 6 |
| Layer Types | Alternating sliding_attention/full_attention | Alternating sliding_attention/full_attention |
| Hidden Size | 2880 | 2880 |
| Number of Experts | 32 | 8 |
| Experts per Token | 4 | 4 |
| Attention Heads | 64 | 64 |
| KV Heads | 8 | 8 |
| Vocab Size | 201088 | 201088 |
| Max Position Embeddings | 131072 | 131072 |
Checkpoint Structure
The model is saved as a single model.safetensors file (unlike the original which is sharded into 9 files). This is appropriate for the smaller model size.
Creation Method
This model was created by:
- Loading the original unsloth/gpt-oss-20b-BF16 model
- Extracting the first 6 layers (maintaining the alternating attention pattern)
- Reducing the number of experts from 32 to 8 (keeping the first 8 experts from each layer)
- Copying embeddings and LM head weights from the original
- Fine-tuning on a small toy dataset to validate learning capability
Validation
The model successfully passes validation tests:
Success: 1.0000132322311401 <= 10.0
==================================================
Generating sample text:
According to all known laws of aviation, there is no way a bee should be able to fly.
==================================================
Perplexity on test text: 1.00 (target: ≤10.0) ✓
The model demonstrates:
- Proper weight initialization
- Ability to learn during fine-tuning
- Coherent text generation
- Low perplexity on training data
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("inference-optimization/gpt-oss-2.5B-A1.3B")
tokenizer = AutoTokenizer.from_pretrained("inference-optimization/gpt-oss-2.5B-A1.3B")
text = "According to all known laws"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Notes
- This model uses the GPT-OSS architecture with sliding window attention and full attention layers
- The model has been fine-tuned on a small copypasta dataset to ensure proper initialization and learning capability
- Suitable for development, testing compression algorithms, and experimentation
- Not intended for production use
- Downloads last month
- 21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support