Mini Qwen3 1M

A tiny Qwen3-compatible causal language model for testing and development. It keeps the Hugging Face Qwen3ForCausalLM architecture and the real Qwen3 tokenizer/chat template, but shrinks the model to about 1.2M parameters with randomly initialized weights.

This model is designed for fast Relax/Megatron/SGLang debugging without pulling large Qwen3 checkpoints into every smoke test. It is intentionally not useful for inference or downstream tasks.

Architecture

Parameter Value
Parameters 1,217,608
Model type qwen3
Hidden size 8
Layers 2
Intermediate size 32
Attention heads 1
KV heads 1
Head dimension 8
Max position embeddings 4,096
Vocab size 151,936
Tensor dtype bfloat16
Tokenizer source Qwen/Qwen3-0.6B local mirror

How this model was created

scripts/tools/create_mock_qwen3.py in the Relax ROCm Megatron workspace:

  1. Loads the tokenizer and config metadata from a local Qwen3-0.6B checkpoint.
  2. Shrinks the Qwen3 config dimensions to the table above.
  3. Initializes Qwen3ForCausalLM with random weights.
  4. Ties word embeddings and saves the model as safetensors.
  5. Writes mock_qwen3_info.json with the exact generation metadata.

The model weights are random. Only tokenizer/chat-template metadata is copied from Qwen3-0.6B.

Reproduction

From the Relax ROCm Megatron repository:

source /vast/users/qirong.ho/miniforge3/etc/profile.d/conda.sh
conda activate relaxrl_rocm
python scripts/tools/create_mock_qwen3.py \
  --tokenizer-source /vast/users/qirong.ho/erland/Python_project/relax_e2e_assets/Qwen3-0.6B \
  --output-dir /vast/users/qirong.ho/erland/Python_project/relax_e2e_assets/Qwen3-Mock-1M

Relax e2e validation

This checkpoint was validated with the Relax AMD ROCm e2e launcher:

NUM_ROLLOUT=2 SAVE_INTERVAL=1 CKPT_FORMAT=torch_dist NO_SAVE_OPTIM=0 \
WANDB_GROUP="qwen3-mock-1m-tmux-20260531_095214" \
./amd_qwen3_mock_2gpu_e2e.sh

Validation evidence:

  • Ray job: raysubmit_sGx5uTXcKu41nHzL
  • W&B run: me4ticfh
  • completed Actor training completed step 0/2
  • completed Actor training completed step 1/2
  • saved torch_dist checkpoints at iterations 0 and 1
  • checkpoint metadata contains optimizer state keys, including optimizer.state.exp_avg and optimizer.state.exp_avg_sq

The e2e validation exercised:

  • Hugging Face model load
  • SGLang transformers rollout
  • Megatron Qwen3Bridge import
  • distributed weight update
  • optimizer step
  • W&B application metrics
  • optimizer-inclusive torch_dist checkpoint save

Intended use

  • Fast Relax/Megatron/SGLang startup and integration tests
  • ROCm smoke tests where Qwen3 code paths matter more than model quality
  • Checkpointing and resume infrastructure checks
  • Debugging model-provider, tokenizer, rollout, and weight-sync wiring

Not intended for

  • Inference quality evaluation
  • Benchmarking Qwen3 capability
  • Any downstream task
  • Reward/loss quality analysis

Because the model is random and extremely small, generated text is expected to be nonsense. During the validation run, rewards were invalid/negative and advantages collapsed to zero; this is expected for this smoke checkpoint.

Downloads last month
16
Safetensors
Model size
1.22M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support