Instructions to use Asilarknes/lsmoe-1b-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Asilarknes/lsmoe-1b-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Asilarknes/lsmoe-1b-v1")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Asilarknes/lsmoe-1b-v1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Asilarknes/lsmoe-1b-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Asilarknes/lsmoe-1b-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Asilarknes/lsmoe-1b-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Asilarknes/lsmoe-1b-v1
- SGLang
How to use Asilarknes/lsmoe-1b-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Asilarknes/lsmoe-1b-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Asilarknes/lsmoe-1b-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Asilarknes/lsmoe-1b-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Asilarknes/lsmoe-1b-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Asilarknes/lsmoe-1b-v1 with Docker Model Runner:
docker model run hf.co/Asilarknes/lsmoe-1b-v1
LSMoE β Layered-Shared Mixture of Experts (1B)
Custom causal language model with shared transformer core and 5 specialized experts (one active per document via keyword-based routing).
- Trained step: 12,500
- Tokens seen: 4,571,217,920
Architecture
| Component | Value |
|---|---|
| Total params | ~844M |
| Active params per token | ~391M |
| Hidden size | 1536 |
| Attention | GQA (12 query / 4 KV heads) |
| Context length | 2048 |
| RoPE base | 10000 |
| Activation | SwiGLU |
| Normalization | RMSNorm |
| Tied embeddings | Yes |
Layer stack
embed β 4Γ shared Bottom blocks (attn + SwiGLU)
β 6Γ Expert SwiGLU layers (one of {Web, Science, Social, Books, Code})
β 4Γ shared Top blocks (attn + SwiGLU)
β RMSNorm β lm_head (tied)
Files
| File | Description |
|---|---|
core.pth |
Shared bottom + top + embedding weights (~400MB fp16) |
Web.pth, Science.pth, Social.pth, Books.pth, Code.pth |
Expert weights (~225MB each fp16) |
state.pth |
Training step + tokens seen |
tokenizer/ |
GPT-2 BPE tokenizer (50257 vocab) |
config.json |
Architecture hyperparameters |
Training
- Datasets: FineWeb-Edu (50%), OpenWebText (20%), Wikipedia (18%), CodeParrot-Clean (12%)
- Optimizer: AdamW (core) + 8-bit Paged AdamW (experts)
- Mixed precision: fp16 + GradScaler
- Distributed: 4Γ V100 32GB DDP, NCCL backend
- Effective batch: 384 samples Γ 2048 tokens = 786K tokens/step
- LR: 1e-4 with cosine decay, warmup 1000 steps
- Regularization: dropout 0.05, z-loss 1e-4, label smoothing 0.02
- EMA decay: 0.9995 (CPU-resident shadow weights)
Loading (custom code required)
This model uses custom architecture not directly compatible with HF AutoModel. You need the original training script to load and run inference. Example:
import torch
from transformers import AutoTokenizer
# (define CoreModel and ExpertBank classes from training script)
tok = AutoTokenizer.from_pretrained("Asilarknes/lsmoe-1b-v1", subfolder="tokenizer")
core = CoreModel(...)
core.load_state_dict(torch.load("core.pth", map_location="cuda"))
expert = ExpertBank(...)
expert.load_state_dict(torch.load("Web.pth", map_location="cuda"))
ids = tok.encode("Hello, world", return_tensors="pt").cuda()
with torch.no_grad():
logits = core.forward_full(ids, expert)
Routing (keyword-based, no learned gate)
| Expert | Triggers |
|---|---|
Code |
def , class , import , algorithm, github, ... |
Science |
quantum, physics, biology, theorem, ... |
Social |
society, government, policy, community, ... |
Books |
chapter, novel, she said, he said, ... |
Web (fallback) |
Everything else |
Status
Work-in-progress checkpoint. Not production-ready. Quality improves with continued training.
- Downloads last month
- 49