YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

gpt-oss-2.5B-A1.3B

A tiny version of unsloth/gpt-oss-20b-BF16 designed for testing and development purposes.

Model Details

  • Base Model: unsloth/gpt-oss-20b-BF16
  • Architecture: GPT-OSS (Mixture-of-Experts)
  • Total Parameters: 2.5B
  • Activated Parameters: ~1.3B (4 out of 8 experts active per token)

Architecture Configuration

Parameter Original Model Tiny Model
Number of Layers 24 6
Layer Types Alternating sliding_attention/full_attention Alternating sliding_attention/full_attention
Hidden Size 2880 2880
Number of Experts 32 8
Experts per Token 4 4
Attention Heads 64 64
KV Heads 8 8
Vocab Size 201088 201088
Max Position Embeddings 131072 131072

Checkpoint Structure

The model is saved as a single model.safetensors file (unlike the original which is sharded into 9 files). This is appropriate for the smaller model size.

Creation Method

This model was created by:

  1. Loading the original unsloth/gpt-oss-20b-BF16 model
  2. Extracting the first 6 layers (maintaining the alternating attention pattern)
  3. Reducing the number of experts from 32 to 8 (keeping the first 8 experts from each layer)
  4. Copying embeddings and LM head weights from the original
  5. Fine-tuning on a small toy dataset to validate learning capability

Validation

The model successfully passes validation tests:

Success: 1.0000132322311401 <= 10.0

==================================================
Generating sample text:
According to all known laws of aviation, there is no way a bee should be able to fly.
==================================================

Perplexity on test text: 1.00 (target: ≤10.0) ✓

The model demonstrates:

  • Proper weight initialization
  • Ability to learn during fine-tuning
  • Coherent text generation
  • Low perplexity on training data

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("inference-optimization/gpt-oss-2.5B-A1.3B")
tokenizer = AutoTokenizer.from_pretrained("inference-optimization/gpt-oss-2.5B-A1.3B")

text = "According to all known laws"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Notes

  • This model uses the GPT-OSS architecture with sliding window attention and full attention layers
  • The model has been fine-tuned on a small copypasta dataset to ensure proper initialization and learning capability
  • Suitable for development, testing compression algorithms, and experimentation
  • Not intended for production use
Downloads last month
21
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inference-optimization/gpt-oss-2.5B-A1.3B

Quantizations
1 model