TinyMoE-100M-2x8

TinyMoE-100M-2x8 is a compact, highly efficient Sparse Mixture of Experts (MoE) language model built upon the Mixtral/Mistral architecture. Designed for research, edge applications, and resource-constrained environments, this model leverages an expert-routing mechanism to balance a larger total parameter capacity with ultra-low computational overhead during inference.

Model Details

  • Architecture: Sparse Mixture of Experts (MoE)
  • Total Parameters: 99,809,280 (~100M total parameters)
  • Active Parameters per Token: 22,544,640 (~22.5M active parameters)
  • Expert Configuration: 8 total local experts, 2 active experts routed per token (num_experts_per_tok": 2)
  • Context Length: 1024 tokens
  • Base Architecture: Mixtral / Mistral For Causal LM
  • License: MIT

Parameter Breakdown

Unlike a standard dense model, an MoE model stores a larger footprint of parameters on disk but selectively activates only a subset for any given token during a forward pass:

Component Total Parameters Status During Inference
Embeddings (Input + LM Head) 24,576,000 Always Active
Attention Blocks (10 Layers) 4,423,680 Always Active
MoE Routers (10 Layers) 30,720 Always Active
Experts (8 Total across 10 Layers) 70,778,880 2 of 8 Active per Layer (~17.6M active)
Overall Footprint 99,809,280 22,544,640 Active per Token

Training Data

This model was trained on a high-quality mixture of datasets to balance narrative fluidness with factual language structural grounding:

  • TinyStories: For coherent, creative synthetic narrative generation.
  • WikiText-103: For general knowledge syntax, vocabulary diversity, and structural language understanding.

Quick Start

You can load and experiment with this model using the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "FlameF0X/TinyMoE-100M-2x8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

input_text = "Once upon a time,"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
164
Safetensors
Model size
99.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FlameF0X/TinyMoE-100m-2x8

Finetunes
1 model
Quantizations
1 model

Datasets used to train FlameF0X/TinyMoE-100m-2x8