Qwen3 mHC

This checkpoint is a Manifold-Constrained Hyper-Connections (mHC) V2 variant of Qwen/Qwen3-0.6B, trained for 100k steps in a parity-mixed setup. It is intended for research on residual stream mixing and hyper-connection behavior.

Model Description

  • Base model: Qwen/Qwen3-0.6B
  • Architecture: Qwen3 with mHC V2 hyper-connections (stream-mixing)
  • Checkpoint: 100,000 steps
  • Language(s): Multilingual (see data notes)
  • License: Apache-2.0 (inherits base model license)

Intended Use

  • Research on mHC V2 hyper-connections and residual stream mixing
  • Fine-tuning or continued training experiments
  • Analysis of stream specialization behavior

Out-of-Scope Use

  • Safety-critical or medical decision-making systems
  • High-stakes automated decision-making without human oversight

Training Data

This checkpoint was trained on multilingual pretokenized datasets, primarily Sangraha shards. The data is prepacked into train/validation splits or shard layouts. Exact dataset composition and filtering are not fully documented here.

Training Procedure

  • Converted from a Qwen3 base checkpoint into an mHC V2 model.
  • Trained for 100k steps in a parity-mixed run.
  • Uses Sinkhorn-based projection for residual mixing stability.

Evaluation

No formal benchmarks are bundled with this checkpoint. If you evaluate this model, please report the setup, prompts, decoding parameters, and comparison baselines.

Downloads last month
25
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for theCoderWithHat/mhc-qwen3

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(985)
this model