Instructions to use samcheng0/lumia-62m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use samcheng0/lumia-62m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="samcheng0/lumia-62m") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("samcheng0/lumia-62m") model = AutoModelForCausalLM.from_pretrained("samcheng0/lumia-62m") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use samcheng0/lumia-62m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "samcheng0/lumia-62m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/lumia-62m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/samcheng0/lumia-62m
- SGLang
How to use samcheng0/lumia-62m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "samcheng0/lumia-62m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/lumia-62m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "samcheng0/lumia-62m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/lumia-62m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use samcheng0/lumia-62m with Docker Model Runner:
docker model run hf.co/samcheng0/lumia-62m
Lumia 62M
A 62.8M parameter reasoning language model, fine-tuned from Supra-50M-Reasoning on 35,944 curated reasoning samples.
Small enough to run on a phone. Smart enough to reason.
Model Details
| Attribute | Value |
|---|---|
| Architecture | LlamaForCausalLM |
| Parameters | 62.8M |
| Hidden size | 448 |
| Layers | 14 |
| Attention heads | 8 (GQA, 8 KV heads) |
| Head dim | 56 |
| Context length | 4096 (YaRN extended, factor 4.0) |
| Vocab size | 32,000 |
| Precision | bfloat16 (~125 MB) |
| License | Apache 2.0 |
Training Configuration
| Hyperparameter | Value |
|---|---|
| Framework | TRL SFTTrainer + PEFT LoRA |
| LoRA rank | r=32, Ξ±=64 (all linear layers) |
| Precision | fp16, torch.compile enabled |
| Batch | 4 per GPU, gradient accumulation 1 |
| Effective batch | 8 (2Γ T4 DDP) |
| Learning rate | 2e-4 cosine, 5% warmup |
| Max seq length | 4096 |
| Epochs | 4 planned, 0.29 completed |
| Hardware | 2Γ Tesla T4 (16GB each) |
| Training time | ~55 min |
| Framework versions | TRL 1.7.0, PyTorch 2.x |
Training Results
| Metric | Value |
|---|---|
| Best eval loss | 7.8651 (step 1100) |
| Final train loss | 7.7178 |
| Total steps | 1,100 |
| Tokens processed | 35.7M |
| Dataset | 35,944 train / 734 eval |
| Samples/sec | ~3.93 |
Loss Curves
The model shows consistent convergence across 1,100 steps. Train loss drops from 10.47 β 7.72 (26.3% reduction), eval loss from 10.43 β 7.87 (24.6% reduction). No overfitting observed β train and eval curves track closely.
Learning Rate Schedule
Cosine schedule with 5% warmup (55 steps). Peak LR 2e-4 reached at step 900, then cosine decay begins. The steady increase during warmup allows the LoRA adapters to initialize gracefully before full learning kicks in.
Gradient Norm
Grad norm stabilizes after ~400 steps. Initial spike at step 400-450 (norm 5.4) is typical for LoRA warmup as adapters find their direction. Settles to 1.5-2.5 range for remainder of training.
Loss Progression Table
| Step | Train Loss | Eval Loss | Ξ Eval |
|---|---|---|---|
| 50 | 10.43 | 10.43 | β |
| 100 | 10.15 | 10.10 | -0.33 |
| 200 | 9.23 | 9.26 | -0.84 |
| 300 | 9.06 | 9.00 | -0.26 |
| 400 | 8.86 | 8.78 | -0.22 |
| 500 | 8.63 | 8.64 | -0.14 |
| 600 | 8.55 | 8.52 | -0.12 |
| 700 | 8.51 | 8.38 | -0.14 |
| 800 | 8.34 | 8.24 | -0.14 |
| 900 | 8.19 | 8.08 | -0.16 |
| 1000 | 8.01 | 7.96 | -0.12 |
| 1100 | 7.72 | 7.87 | -0.09 |
Quick Start
Install Dependencies
pip install -r requirements.txt
Interactive Chat
python generate.py
This starts an interactive chat session. Type your messages and get responses from Lumia 62M.
Single Prompt
python generate.py --prompt "Write a Python function to check if a number is prime"
Python API
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("samcheng0/lumia-62m")
tokenizer = AutoTokenizer.from_pretrained("samcheng0/lumia-62m")
prompt = """<|system|>
You are an expert programmer. Think step by step.
<|user|>
Write a Python function to check if a number is prime.
<|assistant|>"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
Evaluation
python eval.py # Run all benchmarks
python eval.py --category math # Run specific category
python eval.py --verbose # Show full responses
python eval.py --save results.json # Save results to file
Load LoRA Adapter (Continued Training)
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("samcheng0/lumia-62m")
model = PeftModel.from_pretrained(base, "samcheng0/lumia-62m/adapter")
Chat Format
The model supports a chat template with special tokens:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("samcheng0/lumia-62m")
model = AutoModelForCausalLM.from_pretrained("samcheng0/lumia-62m")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"},
]
# Apply chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
Supported Tokens
| Token | ID | Purpose |
|---|---|---|
<|system|> |
32010 | System prompt |
<|user|> |
32011 | User input |
<|assistant|> |
32012 | Model response |
<think> |
32008 | Start reasoning block |
</think> |
32009 | End reasoning block |
[INST] |
32013 | LLaMA-2 instruction start |
[/INST] |
32014 | LLaMA-2 instruction end |
<|code|> |
32023 | Code block marker |
<|text|> |
32024 | Text block marker |
<|math|> |
32025 | Math block marker |
<|think|> |
32026 | Thinking marker |
<|answer|> |
32027 | Answer marker |
Note: All 20 special tokens are single-token IDs. The tokenizer handles them natively for efficient encoding/decoding.
Generation Parameters
| Parameter | Default | Description |
|---|---|---|
temperature |
0.7 | Controls randomness (lower = more deterministic) |
top_p |
0.9 | Nucleus sampling threshold |
max_new_tokens |
512 | Maximum tokens to generate |
repetition_penalty |
1.1 | Penalizes repeated tokens |
Benchmarks
The model was evaluated on 20 test prompts across 5 categories:
| Category | Prompts | Description |
|---|---|---|
| Math | 4 | Arithmetic, algebra, calculus |
| Code | 4 | Python functions, complexity analysis |
| Reasoning | 4 | Logic puzzles, pattern recognition |
| General | 4 | Knowledge, facts, explanations |
| Indonesian | 4 | Translation, comprehension |
Run the full benchmark suite:
python eval.py --verbose
Dataset
Fine-tuned on samcheng0/lumia-reasoning-sft-v1 β 35,944 train + 734 eval samples.
Data Sources (17 datasets)
| Source | Type | Samples |
|---|---|---|
| TeichAI/claude-4.5-opus-high-reasoning-250x | Reasoning traces | ~2.5K |
| TeichAI/Claude-Opus-4.6-Reasoning-887x | Reasoning traces | ~1.8K |
| nohurry/Opus-4.6-Reasoning-3000x-filtered | Reasoning traces | ~2.1K |
| angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k | Code reasoning | ~3.5K |
| Crownelius/Opus-4.6-Reasoning-3300x | Reasoning traces | ~3K |
| nvidia/OpenCodeReasoning | Code reasoning | 10K (sampled) |
| nvidia/OpenCodeReasoning-2 | Code reasoning | 8K |
| open-r1/Mixture-of-Thoughts | Mixed reasoning | ~5K |
| open-thoughts/OpenThoughts-114k | Reasoning | 8K (sampled) |
| teknium/OpenHermes-2.5 | General chat | 30K (sampled) |
| HuggingFaceH4/ultrachat_200k | Multi-turn chat | 15K (sampled) |
| cahya/alpaca-id-cleaned | Indonesian instruction | ~2K |
Filter Pipeline
Raw: ~202K lines β Filtered: ~36K (81.6% filtered out)
| Filter | Threshold |
|---|---|
| Min total chars | 3,000 |
| Min output chars | 1,500 |
| Output/input ratio | β₯ 1.2 |
| Structural score | β₯ 4 (=+3, code block=+2, steps=+2) |
| Dedup | MD5 hash |
Repo Structure
lumia-62m/
βββ config.json # Model architecture
βββ model.safetensors # Merged weights (inference ready)
βββ tokenizer.json # Tokenizer (with special tokens)
βββ tokenizer_config.json # Tokenizer settings + chat template
βββ special_tokens_map.json # Special tokens ID mapping
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ generate.py # Interactive inference script
βββ eval.py # Evaluation benchmark
βββ add_special_tokens.py # Token management script
βββ banner.svg # Header banner
βββ loss_curve.svg # Training loss chart
βββ lr_schedule.svg # Learning rate chart
βββ grad_norm.svg # Gradient norm chart
βββ adapter/ # LoRA adapter + training state
βββ adapter_model.safetensors # LoRA weights (14.7 MB)
βββ adapter_config.json # PEFT config
βββ optimizer.pt # AdamW state (resume training)
βββ scheduler.pt # LR scheduler state
βββ scaler.pt # Gradient scaler
βββ trainer_state.json # Full training metrics
βββ train.log # Training log
Citation
@misc{lumia-62m,
title={Lumia 62M: A Small Reasoning Language Model},
author={samcheng0},
year={2026},
howpublished={\url{https://huggingface.co/samcheng0/lumia-62m}},
}
License
Apache 2.0
- Downloads last month
- -


