Instructions to use StentorLabs/Stentor3-20M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use StentorLabs/Stentor3-20M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="StentorLabs/Stentor3-20M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("StentorLabs/Stentor3-20M")
model = AutoModelForCausalLM.from_pretrained("StentorLabs/Stentor3-20M")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use StentorLabs/Stentor3-20M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "StentorLabs/Stentor3-20M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StentorLabs/Stentor3-20M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/StentorLabs/Stentor3-20M

SGLang

How to use StentorLabs/Stentor3-20M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "StentorLabs/Stentor3-20M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StentorLabs/Stentor3-20M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "StentorLabs/Stentor3-20M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "StentorLabs/Stentor3-20M",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use StentorLabs/Stentor3-20M with Docker Model Runner:
```
docker model run hf.co/StentorLabs/Stentor3-20M
```

Appreciation & Inquiry to StentorLabs

by GODELEV - opened 1 day ago

Discussion

GODELEV

1 day ago

To: Kai Izumoto (@StentorLabs )

Dear Kai,

I have been following your work at StentorLabs and am deeply impressed by your ability to train strong, efficient base models like Stentor3 entirely on free-tier Kaggle compute. Maximizing TPU quotas and T4 GPUs to build competitive models on a zero-dollar budget is a massive inspiration to the open-source community.

Your dedication proves that impactful AI development doesn't require a massive corporate budget.

I would love to briefly ask: what are your next future plans for StentorLabs? Are you planning to refine these hyper-efficient sub-100M architectures further, or are there new training experiments you are looking forward to?

Thank you for your incredible work, transparency, and contribution to open-source AI!

Warm regards,
Akshit

StentorLabs

Owner about 19 hours ago

Dear Akshit,

Thank you for the kind words and support. My goal with StentorLabs is to show that capable open-source language models can be built with extremely limited resources. Over the next few years, I plan to continue developing both the Stentor family (primarily in the 10M–99M parameter range) and the Portimbria family (100M+), with a strong focus on improving efficiency rather than simply increasing parameter count. One of the biggest changes in my thinking recently is that I have largely reversed my previous position on model architecture. Earlier generations leaned toward more balanced or slightly wider designs, but after studying recent small-model research and comparing some of the strongest models in the space, I have become convinced that depth is far more important than I originally thought. As a result, future generations will move toward significantly deeper architectures. I am also investigating hybrid state-space architectures and related approaches that could make long-context training much more compute-efficient.

Looking further ahead, I hope to substantially close the performance gap between very small models and larger alternatives. While my current models are still a work in progress, my ambition is to eventually build models in the 50M-parameter range that can compete with models several times larger. I intend to keep my work open through detailed documentation, model cards, public datasets, and open weights, while maintaining my own training infrastructure and codebase. Although I do not currently publish my training code, I try to document enough of the design decisions and methodology that others can learn from the work and build upon the ideas themselves. More than anything, I hope StentorLabs can contribute useful ideas to the open-source small-language-model community and help demonstrate what independent researchers can accomplish with creativity, persistence, and efficient use of compute.

Warm regards,

Kai Izumoto
Founder, StentorLabs

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment