Qwen3-1.6B-A0.9B

This is a tiny version of Qwen/Qwen3-30B-A3B created for testing and development.

Model Details

  • Base Model: Qwen/Qwen3-30B-A3B
  • Architecture: qwen3_moe (Mixture of Experts)
  • Total Parameters: 1.57B
  • Activated Parameters: ~0.9B (50% MoE activation)

Configuration Changes

The following parameters were reduced from the original model:

Parameter Original Tiny
num_hidden_layers 48 10
num_local_experts 128 16
num_experts_per_tok 8 8
hidden_size 2048 2048
intermediate_size 6144 6144
moe_intermediate_size 768 768
num_attention_heads 32 32
num_key_value_heads 4 4

Checkpoint Structure

The checkpoint is stored as a single model.safetensors file with individual expert weights matching the original Qwen3 structure. Each layer has 16 experts with separate gate_proj, up_proj, and down_proj weights per expert.

Validation

The model was fine-tuned on a toy copypasta dataset and achieves:

  • Perplexity: 1.0 (on validation text)
  • Generation: Successfully generates coherent continuations

Example generation:

Input: "According to all known laws"
Output: "According to all known laws of aviation, there is no way a bee should be able to fly."

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("inference-optimization/Qwen3-1.6B-A0.9B", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("inference-optimization/Qwen3-1.6B-A0.9B")

input_ids = tokenizer("According to all known laws", return_tensors="pt").input_ids.to(model.device)
output = model.generate(input_ids, max_new_tokens=20)
print(tokenizer.decode(output[0]))

Creation Process

This model was created using the llm-compressor create-tiny-model Claude skill:

  1. Configuration: Created with 10 layers and 16 experts (8 activated per token) to achieve ~1.6B total parameters with 50% MoE activation
  2. Initialization: Randomly initialized weights using transformers init_weights()
  3. Fine-tuning: Trained on famous internet copypastas until perplexity < 3.0
  4. Checkpoint Conversion: Converted from batched expert format to individual expert format to match original Qwen3 checkpoint structure
  5. Validation: Confirmed perplexity ~1.0 and successful text generation

Notes

  • This model maintains the same MoE architecture as the original with Grouped Query Attention (GQA)
  • The checkpoint format exactly matches the original Qwen3-30B-A3B structure for compatibility
  • This model is intended for testing and development only, not production use
Downloads last month
114
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inference-optimization/Qwen3-1.6B-A0.9B

Finetuned
(61)
this model
Quantizations
2 models