Instructions to use rodin-llm/rodin-1b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rodin-llm/rodin-1b-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rodin-llm/rodin-1b-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rodin-llm/rodin-1b-instruct")
model = AutoModelForCausalLM.from_pretrained("rodin-llm/rodin-1b-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use rodin-llm/rodin-1b-instruct with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rodin-llm/rodin-1b-instruct",
	filename="rodin-1b-instruct-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use rodin-llm/rodin-1b-instruct with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf rodin-llm/rodin-1b-instruct:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf rodin-llm/rodin-1b-instruct:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf rodin-llm/rodin-1b-instruct:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf rodin-llm/rodin-1b-instruct:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rodin-llm/rodin-1b-instruct:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf rodin-llm/rodin-1b-instruct:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rodin-llm/rodin-1b-instruct:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rodin-llm/rodin-1b-instruct:Q4_K_M

Use Docker

docker model run hf.co/rodin-llm/rodin-1b-instruct:Q4_K_M

LM Studio
Jan

vLLM

How to use rodin-llm/rodin-1b-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rodin-llm/rodin-1b-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rodin-llm/rodin-1b-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rodin-llm/rodin-1b-instruct:Q4_K_M

SGLang

How to use rodin-llm/rodin-1b-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rodin-llm/rodin-1b-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rodin-llm/rodin-1b-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rodin-llm/rodin-1b-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rodin-llm/rodin-1b-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use rodin-llm/rodin-1b-instruct with Ollama:
```
ollama run hf.co/rodin-llm/rodin-1b-instruct:Q4_K_M
```

Unsloth Studio

How to use rodin-llm/rodin-1b-instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rodin-llm/rodin-1b-instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rodin-llm/rodin-1b-instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rodin-llm/rodin-1b-instruct to start chatting

Atomic Chat new
Docker Model Runner
How to use rodin-llm/rodin-1b-instruct with Docker Model Runner:
```
docker model run hf.co/rodin-llm/rodin-1b-instruct:Q4_K_M
```

Lemonade

How to use rodin-llm/rodin-1b-instruct with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rodin-llm/rodin-1b-instruct:Q4_K_M

Run and chat with the model

lemonade run user.rodin-1b-instruct-Q4_K_M

List all available models

lemonade list

RODIN-1B-Instruct

A French conversational model trained from scratch — solo, on consumer-grade hardware. Un modele conversationnel francais entraine de zero — en solo, sur du materiel grand public.

🇬🇧 English · 🇫🇷 Français

This is the instruction-tuned, conversational model. For the base pretrained model, see rodin-llm/rodin-1b. Ceci est le modele conversationnel (instruction-tune). Pour le modele de base, voir rodin-llm/rodin-1b.

💻 Full source code / Code source complet : github.com/rodin-llm/rodin The complete pipeline — data, tokenizer, pretraining, SFT, export, spot orchestration. Le pipeline complet — données, tokenizer, pré-entraînement, SFT, export, orchestration spot.

🇬🇧 English

Overview

RODIN-1B-Instruct is the instruction-tuned, conversational variant of rodin-1b — a 1.24-billion-parameter, French-only causal language model trained from scratch by a single person. RODIN stands for Research Open Deep Intelligence Natively-french.

It was produced by full supervised fine-tuning (SFT) of the base model on French ChatML examples, and speaks French fluently in a conversational register. GGUF quantizations (F16, Q8_0, Q4_K_M) are provided for llama.cpp, Ollama and LM Studio, including CPU-only hardware.

Why this model exists

The goal was never to compete with large, well-funded French models on raw benchmark scores. For scale: comparable French open-source efforts were trained on 3,000 billion tokens using hundreds of H100 GPUs on national supercomputers. RODIN was trained on 32 billion tokens, by one person, on a rented spot B200 instance plus a single RTX 3090 for local iteration and SFT.

The value of RODIN is pedagogical and demonstrative: it shows what one motivated individual can build from scratch, end to end, with a small budget — documented honestly, limitations included.

Chat format

RODIN-1B-Instruct uses the ChatML format:

<|im_start|>user
{message}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>

Stop token: <|im_end|>. The chat template is bundled in tokenizer_config.json, so apply_chat_template and runtimes like Ollama/LM Studio handle it automatically.

Model description

Property	Value
Parameters	1.238 B
Architecture	LLaMA-style (RoPE, RMSNorm, SwiGLU, causal SDPA attention)
Hidden size / Layers / Heads	2048 / 22 / 16
FFN intermediate size	5461
Vocabulary	64,000 (custom SentencePiece BPE, French)
Context length	2048 tokens
Chat format	ChatML
Training dtype	bfloat16

SFT data

Full fine-tune on 5,000 ChatML examples generated locally with a Qwen model, covering varied French registers (narrative, summarization, rewriting, simple factual, etc.) plus a small politeness set. Math and legal tasks were deliberately excluded — fine-tuning a small model on content it gets wrong would only teach it to hallucinate with authority.

The underlying pretraining data is documented on the base model card.

Evaluation — FrenchBench (base vs instruct)

Evaluated with EleutherAI's lm-evaluation-harness, french_bench, 3-shot, full test sets, instruct evaluated with --apply_chat_template (OrangeSum excluded due to a datasets incompatibility).

Task	Metric	Base	Instruct
Vocabulary	acc	0.773	0.756
Grammar	acc	0.765	0.756
Reading comprehension	acc	0.606	0.549
BoolQA	acc	0.573	0.562
Topic-based NLI	acc_norm	0.498	0.378
HellaSwag	acc_norm	0.424	0.429
ARC challenge	acc	0.220	0.257
ARC challenge	acc_norm	0.265	0.309
XNLI	acc	0.333	0.323
Trivia	f1	0.245	0.142
Trivia	is_included	0.190	0.213

The SFT effect — a key observation. On Trivia, exact/f1 drop while is_included rises (0.19 → 0.21). This is not a regression: the base model answered bluntly ("Paris"); after SFT, the model wraps answers in full sentences ("The capital is Paris."), so the correct answer is still present (is_included ↑) but no longer an exact string match. The SFT made the model conversational, exactly as intended. Note also that ARC challenge improves with SFT (0.27 → 0.31 acc_norm), while core linguistic competence (grammar, vocabulary) is preserved.

Intended use & limitations

Intended use. French conversation, summarization, rewriting, simple factual Q&A, education and experimentation. A demonstrator, not a production system.

Limitations.

Size. At 1.24B parameters, world knowledge and reasoning are limited; it hallucinates on precise facts. Do not rely on it for factual accuracy.
No safety tuning. The SFT contained no refusal or safety data. The model has not been trained to refuse harmful, biased, or inappropriate requests, and may produce such content. It is not suitable for unsupervised or public-facing deployment.
French only, context limited to 2048 tokens.
Math, code, legal: deliberately excluded from SFT; expect weak performance.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "rodin-llm/rodin-1b-instruct"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda()

messages = [{"role": "user", "content": "Explique-moi en deux phrases ce qu'est la photosynthèse."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").cuda()
out = model.generate(inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))

Ollama (GGUF)

ollama run rodin-1b

Training procedure (summary)

Base pretraining: 478,000 steps, ~32B French tokens, rented spot B200 (bfloat16). See base model.
SFT: full fine-tune on 5,000 ChatML examples, single RTX 3090, loss masked on assistant responses only.
Export: custom RodinLM → HuggingFace LlamaForCausalLM → GGUF (F16, Q8_0, Q4_K_M).

License

Released under the Apache 2.0 license.

Acknowledgements & transparency

Carried out by one person, with AI assistance openly acknowledged throughout. Thanks to EleutherAI (evaluation), the HPLT and Pleias teams (data), Wikimedia, and the llama.cpp / Ollama / LM Studio projects.

@misc{rodin1binstruct2026,
  title  = {RODIN-1B-Instruct: A French Conversational Model Trained From Scratch},
  author = {RODIN},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/rodin-llm/rodin-1b-instruct}}
}

🇫🇷 Français

Présentation

RODIN-1B-Instruct est la variante conversationnelle (instruction-tunée) de rodin-1b — un modèle de langage causal uniquement francophone, de 1,24 milliard de paramètres, entraîné de zéro par une seule personne. RODIN signifie Research Open Deep Intelligence Natively-french.

Il a été produit par fine-tuning supervisé complet (SFT) du modèle de base sur des exemples ChatML français, et parle un français fluide dans un registre conversationnel. Des quantisations GGUF (F16, Q8_0, Q4_K_M) sont fournies pour llama.cpp, Ollama et LM Studio, y compris sur matériel CPU uniquement.

Pourquoi ce modèle existe

L'objectif n'a jamais été de rivaliser en score brut avec les gros modèles français bien financés. Pour situer : des projets français open source comparables ont été entraînés sur 3 000 milliards de tokens avec des centaines de GPU H100 sur des supercalculateurs nationaux. RODIN a été entraîné sur 32 milliards de tokens, par une seule personne, sur une instance B200 spot louée et une seule RTX 3090 pour l'itération locale et le SFT.

La valeur de RODIN est pédagogique et démonstrative : il montre ce qu'une personne motivée peut construire de zéro, de bout en bout, avec un petit budget — documenté honnêtement, limites comprises.

Format de chat

RODIN-1B-Instruct utilise le format ChatML :

<|im_start|>user
{message}<|im_end|>
<|im_start|>assistant
{réponse}<|im_end|>

Token d'arrêt : <|im_end|>. Le chat template est embarqué dans tokenizer_config.json, donc apply_chat_template et les runtimes comme Ollama/LM Studio le gèrent automatiquement.

Description du modèle

Propriété	Valeur
Paramètres	1,238 milliard
Architecture	Style LLaMA (RoPE, RMSNorm, SwiGLU, attention causale SDPA)
Dimension / Couches / Têtes	2048 / 22 / 16
Dimension FFN	5461
Vocabulaire	64 000 (BPE SentencePiece maison, français)
Longueur de contexte	2048 tokens
Format de chat	ChatML
Précision d'entraînement	bfloat16

Données de SFT

Full fine-tune sur 5 000 exemples ChatML générés localement avec un modèle Qwen, couvrant des registres français variés (narratif, résumé, réécriture, factuel simple, etc.) plus un petit jeu de politesse. Les maths et le juridique ont été délibérément exclus — fine-tuner un petit modèle sur du contenu qu'il rate ne ferait que lui apprendre à halluciner avec autorité.

Les données de pré-entraînement sous-jacentes sont documentées sur la carte du modèle de base.

Évaluation — FrenchBench (base vs instruct)

Évalué avec lm-evaluation-harness d'EleutherAI, french_bench, 3-shot, jeux de test complets, instruct évalué avec --apply_chat_template (OrangeSum exclu pour incompatibilité datasets).

Tâche	Métrique	Base	Instruct
Vocabulaire	acc	0,773	0,756
Grammaire	acc	0,765	0,756
Compréhension écrite	acc	0,606	0,549
BoolQA	acc	0,573	0,562
NLI thématique	acc_norm	0,498	0,378
HellaSwag	acc_norm	0,424	0,429
ARC challenge	acc	0,220	0,257
ARC challenge	acc_norm	0,265	0,309
XNLI	acc	0,333	0,323
Trivia	f1	0,245	0,142
Trivia	is_included	0,190	0,213

L'effet du SFT — une observation clé. Sur Trivia, exact/f1 baissent alors que is_included monte (0,19 → 0,21). Ce n'est pas une régression : le modèle de base répondait sèchement (« Paris ») ; après SFT, il enrobe ses réponses dans des phrases complètes (« La capitale est Paris. »), donc la bonne réponse reste présente (is_included ↑) mais ce n'est plus une correspondance exacte. Le SFT a rendu le modèle conversationnel, exactement comme prévu. À noter aussi : ARC challenge s'améliore avec le SFT (0,27 → 0,31 acc_norm), tandis que la compétence linguistique de base (grammaire, vocabulaire) est préservée.

Usage prévu & limites

Usage prévu. Conversation française, résumé, réécriture, questions-réponses factuelles simples, éducation et expérimentation. Un démonstrateur, pas un système de production.

Limites.

Taille. À 1,24B paramètres, connaissances et raisonnement limités ; hallucine sur les faits précis. Ne vous y fiez pas pour l'exactitude factuelle.
Aucun safety tuning. Le SFT ne contenait aucune donnée de refus ou de sécurité. Le modèle n'a pas été entraîné à refuser des requêtes nuisibles, biaisées ou inappropriées, et peut produire de tels contenus. Il n'est pas adapté à un déploiement public ou non supervisé.
Français uniquement, contexte limité à 2048 tokens.
Maths, code, juridique : délibérément exclus du SFT ; performances faibles attendues.

Utilisation

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "rodin-llm/rodin-1b-instruct"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda()

messages = [{"role": "user", "content": "Explique-moi en deux phrases ce qu'est la photosynthèse."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").cuda()
out = model.generate(inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))

Ollama (GGUF)

ollama run rodin-1b

Procédure d'entraînement (résumé)

Pré-entraînement de base : 478 000 steps, ~32B tokens français, B200 spot louée (bfloat16). Voir le modèle de base.
SFT : full fine-tune sur 5 000 exemples ChatML, une seule RTX 3090, loss masquée sur les réponses de l'assistant uniquement.
Export : RodinLM maison → LlamaForCausalLM HuggingFace → GGUF (F16, Q8_0, Q4_K_M).

Licence

Publié sous licence Apache 2.0.

Remerciements & transparence

Mené par une seule personne, avec une assistance IA assumée et transparente tout du long. Merci à EleutherAI (évaluation), aux équipes HPLT et Pleias (données), à Wikimedia, et aux projets llama.cpp / Ollama / LM Studio.

@misc{rodin1binstruct2026,
  title  = {RODIN-1B-Instruct: A French Conversational Model Trained From Scratch},
  author = {RODIN},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/rodin-llm/rodin-1b-instruct}}
}

Downloads last month: 35

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for rodin-llm/rodin-1b-instruct

Base model

rodin-llm/rodin-1b

Quantized

(3)

this model

Quantizations

2 models

Datasets used to train rodin-llm/rodin-1b-instruct

Evaluation results

Grammar (acc) on FrenchBench
self-reported

0.756
ARC challenge (acc_norm) on FrenchBench
self-reported

0.309