Instructions to use samcheng0/lumia-62m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use samcheng0/lumia-62m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="samcheng0/lumia-62m")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("samcheng0/lumia-62m")
model = AutoModelForCausalLM.from_pretrained("samcheng0/lumia-62m")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use samcheng0/lumia-62m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "samcheng0/lumia-62m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "samcheng0/lumia-62m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/samcheng0/lumia-62m

SGLang

How to use samcheng0/lumia-62m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "samcheng0/lumia-62m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "samcheng0/lumia-62m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "samcheng0/lumia-62m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "samcheng0/lumia-62m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use samcheng0/lumia-62m with Docker Model Runner:
```
docker model run hf.co/samcheng0/lumia-62m
```

Lumia 62M

A 62.8M parameter reasoning language model, fine-tuned from Supra-50M-Reasoning on 35,944 curated reasoning samples.

Small enough to run on a phone. Smart enough to reason.

Model Details

Attribute	Value
Architecture	LlamaForCausalLM
Parameters	62.8M
Hidden size	448
Layers	14
Attention heads	8 (GQA, 8 KV heads)
Head dim	56
Context length	4096 (YaRN extended, factor 4.0)
Vocab size	32,000
Precision	bfloat16 (~125 MB)
License	Apache 2.0

Training Configuration

Hyperparameter	Value
Framework	TRL SFTTrainer + PEFT LoRA
LoRA rank	r=32, α=64 (all linear layers)
Precision	fp16, `torch.compile` enabled
Batch	4 per GPU, gradient accumulation 1
Effective batch	8 (2× T4 DDP)
Learning rate	2e-4 cosine, 5% warmup
Max seq length	4096
Epochs	4 planned, 0.29 completed
Hardware	2× Tesla T4 (16GB each)
Training time	~55 min
Framework versions	TRL 1.7.0, PyTorch 2.x

Training Results

Metric	Value
Best eval loss	7.8651 (step 1100)
Final train loss	7.7178
Total steps	1,100
Tokens processed	35.7M
Dataset	35,944 train / 734 eval
Samples/sec	~3.93

Loss Curves

The model shows consistent convergence across 1,100 steps. Train loss drops from 10.47 → 7.72 (26.3% reduction), eval loss from 10.43 → 7.87 (24.6% reduction). No overfitting observed — train and eval curves track closely.

Learning Rate Schedule

Cosine schedule with 5% warmup (55 steps). Peak LR 2e-4 reached at step 900, then cosine decay begins. The steady increase during warmup allows the LoRA adapters to initialize gracefully before full learning kicks in.

Gradient Norm

Grad norm stabilizes after ~400 steps. Initial spike at step 400-450 (norm 5.4) is typical for LoRA warmup as adapters find their direction. Settles to 1.5-2.5 range for remainder of training.

Loss Progression Table

Step	Train Loss	Eval Loss	Δ Eval
50	10.43	10.43	—
100	10.15	10.10	-0.33
200	9.23	9.26	-0.84
300	9.06	9.00	-0.26
400	8.86	8.78	-0.22
500	8.63	8.64	-0.14
600	8.55	8.52	-0.12
700	8.51	8.38	-0.14
800	8.34	8.24	-0.14
900	8.19	8.08	-0.16
1000	8.01	7.96	-0.12
1100	7.72	7.87	-0.09

Quick Start

Install Dependencies

pip install -r requirements.txt

Interactive Chat

python generate.py

This starts an interactive chat session. Type your messages and get responses from Lumia 62M.

Single Prompt

python generate.py --prompt "Write a Python function to check if a number is prime"

Python API

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("samcheng0/lumia-62m")
tokenizer = AutoTokenizer.from_pretrained("samcheng0/lumia-62m")

prompt = """<|system|>
You are an expert programmer. Think step by step.
<|user|>
Write a Python function to check if a number is prime.
<|assistant|>"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Evaluation

python eval.py                    # Run all benchmarks
python eval.py --category math    # Run specific category
python eval.py --verbose          # Show full responses
python eval.py --save results.json  # Save results to file

Load LoRA Adapter (Continued Training)

from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("samcheng0/lumia-62m")
model = PeftModel.from_pretrained(base, "samcheng0/lumia-62m/adapter")

Chat Format

The model supports a chat template with special tokens:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("samcheng0/lumia-62m")
model = AutoModelForCausalLM.from_pretrained("samcheng0/lumia-62m")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
]

# Apply chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

Supported Tokens

Token	ID	Purpose
`<\|system\|>`	32010	System prompt
`<\|user\|>`	32011	User input
`<\|assistant\|>`	32012	Model response
`<think>`	32008	Start reasoning block
`</think>`	32009	End reasoning block
`[INST]`	32013	LLaMA-2 instruction start
`[/INST]`	32014	LLaMA-2 instruction end
`<\|code\|>`	32023	Code block marker
`<\|text\|>`	32024	Text block marker
`<\|math\|>`	32025	Math block marker
`<\|think\|>`	32026	Thinking marker
`<\|answer\|>`	32027	Answer marker

Note: All 20 special tokens are single-token IDs. The tokenizer handles them natively for efficient encoding/decoding.

Generation Parameters

Parameter	Default	Description
`temperature`	0.7	Controls randomness (lower = more deterministic)
`top_p`	0.9	Nucleus sampling threshold
`max_new_tokens`	512	Maximum tokens to generate
`repetition_penalty`	1.1	Penalizes repeated tokens

Benchmarks

The model was evaluated on 20 test prompts across 5 categories:

Category	Prompts	Description
Math	4	Arithmetic, algebra, calculus
Code	4	Python functions, complexity analysis
Reasoning	4	Logic puzzles, pattern recognition
General	4	Knowledge, facts, explanations
Indonesian	4	Translation, comprehension

Run the full benchmark suite:

python eval.py --verbose

Dataset

Fine-tuned on samcheng0/lumia-reasoning-sft-v1 — 35,944 train + 734 eval samples.

Data Sources (17 datasets)

Source	Type	Samples
TeichAI/claude-4.5-opus-high-reasoning-250x	Reasoning traces	~2.5K
TeichAI/Claude-Opus-4.6-Reasoning-887x	Reasoning traces	~1.8K
nohurry/Opus-4.6-Reasoning-3000x-filtered	Reasoning traces	~2.1K
angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k	Code reasoning	~3.5K
Crownelius/Opus-4.6-Reasoning-3300x	Reasoning traces	~3K
nvidia/OpenCodeReasoning	Code reasoning	10K (sampled)
nvidia/OpenCodeReasoning-2	Code reasoning	8K
open-r1/Mixture-of-Thoughts	Mixed reasoning	~5K
open-thoughts/OpenThoughts-114k	Reasoning	8K (sampled)
teknium/OpenHermes-2.5	General chat	30K (sampled)
HuggingFaceH4/ultrachat_200k	Multi-turn chat	15K (sampled)
cahya/alpaca-id-cleaned	Indonesian instruction	~2K

Filter Pipeline

Raw: ~202K lines → Filtered: ~36K (81.6% filtered out)

Filter	Threshold
Min total chars	3,000
Min output chars	1,500
Output/input ratio	≥ 1.2
Structural score	≥ 4 (=+3, code block=+2, steps=+2)
Dedup	MD5 hash

Repo Structure

lumia-62m/
├── config.json                # Model architecture
├── model.safetensors          # Merged weights (inference ready)
├── tokenizer.json             # Tokenizer (with special tokens)
├── tokenizer_config.json      # Tokenizer settings + chat template
├── special_tokens_map.json    # Special tokens ID mapping
├── README.md                  # This file
├── requirements.txt           # Python dependencies
├── generate.py                # Interactive inference script
├── eval.py                    # Evaluation benchmark
├── add_special_tokens.py      # Token management script
├── banner.svg                 # Header banner
├── loss_curve.svg             # Training loss chart
├── lr_schedule.svg            # Learning rate chart
├── grad_norm.svg              # Gradient norm chart
└── adapter/                   # LoRA adapter + training state
    ├── adapter_model.safetensors   # LoRA weights (14.7 MB)
    ├── adapter_config.json         # PEFT config
    ├── optimizer.pt                # AdamW state (resume training)
    ├── scheduler.pt                # LR scheduler state
    ├── scaler.pt                   # Gradient scaler
    ├── trainer_state.json          # Full training metrics
    └── train.log                   # Training log

Citation

@misc{lumia-62m,
  title={Lumia 62M: A Small Reasoning Language Model},
  author={samcheng0},
  year={2026},
  howpublished={\url{https://huggingface.co/samcheng0/lumia-62m}},
}

License

Apache 2.0

Downloads last month: -

Safetensors

Model size

62.9M params

Tensor type

F16