Instructions to use rodin-llm/rodin-1b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rodin-llm/rodin-1b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rodin-llm/rodin-1b-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rodin-llm/rodin-1b-instruct") model = AutoModelForCausalLM.from_pretrained("rodin-llm/rodin-1b-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use rodin-llm/rodin-1b-instruct with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rodin-llm/rodin-1b-instruct", filename="rodin-1b-instruct-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use rodin-llm/rodin-1b-instruct with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf rodin-llm/rodin-1b-instruct:Q4_K_M # Run inference directly in the terminal: llama cli -hf rodin-llm/rodin-1b-instruct:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf rodin-llm/rodin-1b-instruct:Q4_K_M # Run inference directly in the terminal: llama cli -hf rodin-llm/rodin-1b-instruct:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rodin-llm/rodin-1b-instruct:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rodin-llm/rodin-1b-instruct:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rodin-llm/rodin-1b-instruct:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rodin-llm/rodin-1b-instruct:Q4_K_M
Use Docker
docker model run hf.co/rodin-llm/rodin-1b-instruct:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use rodin-llm/rodin-1b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rodin-llm/rodin-1b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rodin-llm/rodin-1b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rodin-llm/rodin-1b-instruct:Q4_K_M
- SGLang
How to use rodin-llm/rodin-1b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rodin-llm/rodin-1b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rodin-llm/rodin-1b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rodin-llm/rodin-1b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rodin-llm/rodin-1b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use rodin-llm/rodin-1b-instruct with Ollama:
ollama run hf.co/rodin-llm/rodin-1b-instruct:Q4_K_M
- Unsloth Studio
How to use rodin-llm/rodin-1b-instruct with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rodin-llm/rodin-1b-instruct to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rodin-llm/rodin-1b-instruct to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rodin-llm/rodin-1b-instruct to start chatting
- Atomic Chat new
- Docker Model Runner
How to use rodin-llm/rodin-1b-instruct with Docker Model Runner:
docker model run hf.co/rodin-llm/rodin-1b-instruct:Q4_K_M
- Lemonade
How to use rodin-llm/rodin-1b-instruct with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rodin-llm/rodin-1b-instruct:Q4_K_M
Run and chat with the model
lemonade run user.rodin-1b-instruct-Q4_K_M
List all available models
lemonade list
RODIN-1B-Instruct
A French conversational model trained from scratch — solo, on consumer-grade hardware. Un modele conversationnel francais entraine de zero — en solo, sur du materiel grand public.
This is the instruction-tuned, conversational model. For the base pretrained model, see
rodin-llm/rodin-1b. Ceci est le modele conversationnel (instruction-tune). Pour le modele de base, voirrodin-llm/rodin-1b.
💻 Full source code / Code source complet : github.com/rodin-llm/rodin The complete pipeline — data, tokenizer, pretraining, SFT, export, spot orchestration. Le pipeline complet — données, tokenizer, pré-entraînement, SFT, export, orchestration spot.
🇬🇧 English
Overview
RODIN-1B-Instruct is the instruction-tuned, conversational variant of rodin-1b — a 1.24-billion-parameter, French-only causal language model trained from scratch by a single person. RODIN stands for Research Open Deep Intelligence Natively-french.
It was produced by full supervised fine-tuning (SFT) of the base model on French ChatML examples, and speaks French fluently in a conversational register. GGUF quantizations (F16, Q8_0, Q4_K_M) are provided for llama.cpp, Ollama and LM Studio, including CPU-only hardware.
Why this model exists
The goal was never to compete with large, well-funded French models on raw benchmark scores. For scale: comparable French open-source efforts were trained on 3,000 billion tokens using hundreds of H100 GPUs on national supercomputers. RODIN was trained on 32 billion tokens, by one person, on a rented spot B200 instance plus a single RTX 3090 for local iteration and SFT.
The value of RODIN is pedagogical and demonstrative: it shows what one motivated individual can build from scratch, end to end, with a small budget — documented honestly, limitations included.
Chat format
RODIN-1B-Instruct uses the ChatML format:
<|im_start|>user
{message}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>
Stop token: <|im_end|>. The chat template is bundled in tokenizer_config.json, so apply_chat_template and runtimes like Ollama/LM Studio handle it automatically.
Model description
| Property | Value |
|---|---|
| Parameters | 1.238 B |
| Architecture | LLaMA-style (RoPE, RMSNorm, SwiGLU, causal SDPA attention) |
| Hidden size / Layers / Heads | 2048 / 22 / 16 |
| FFN intermediate size | 5461 |
| Vocabulary | 64,000 (custom SentencePiece BPE, French) |
| Context length | 2048 tokens |
| Chat format | ChatML |
| Training dtype | bfloat16 |
SFT data
Full fine-tune on 5,000 ChatML examples generated locally with a Qwen model, covering varied French registers (narrative, summarization, rewriting, simple factual, etc.) plus a small politeness set. Math and legal tasks were deliberately excluded — fine-tuning a small model on content it gets wrong would only teach it to hallucinate with authority.
The underlying pretraining data is documented on the base model card.
Evaluation — FrenchBench (base vs instruct)
Evaluated with EleutherAI's lm-evaluation-harness, french_bench, 3-shot, full test sets, instruct evaluated with --apply_chat_template (OrangeSum excluded due to a datasets incompatibility).
| Task | Metric | Base | Instruct |
|---|---|---|---|
| Vocabulary | acc | 0.773 | 0.756 |
| Grammar | acc | 0.765 | 0.756 |
| Reading comprehension | acc | 0.606 | 0.549 |
| BoolQA | acc | 0.573 | 0.562 |
| Topic-based NLI | acc_norm | 0.498 | 0.378 |
| HellaSwag | acc_norm | 0.424 | 0.429 |
| ARC challenge | acc | 0.220 | 0.257 |
| ARC challenge | acc_norm | 0.265 | 0.309 |
| XNLI | acc | 0.333 | 0.323 |
| Trivia | f1 | 0.245 | 0.142 |
| Trivia | is_included | 0.190 | 0.213 |
The SFT effect — a key observation. On Trivia, exact/f1 drop while is_included rises (0.19 → 0.21). This is not a regression: the base model answered bluntly ("Paris"); after SFT, the model wraps answers in full sentences ("The capital is Paris."), so the correct answer is still present (is_included ↑) but no longer an exact string match. The SFT made the model conversational, exactly as intended. Note also that ARC challenge improves with SFT (0.27 → 0.31 acc_norm), while core linguistic competence (grammar, vocabulary) is preserved.
Intended use & limitations
Intended use. French conversation, summarization, rewriting, simple factual Q&A, education and experimentation. A demonstrator, not a production system.
Limitations.
- Size. At 1.24B parameters, world knowledge and reasoning are limited; it hallucinates on precise facts. Do not rely on it for factual accuracy.
- No safety tuning. The SFT contained no refusal or safety data. The model has not been trained to refuse harmful, biased, or inappropriate requests, and may produce such content. It is not suitable for unsupervised or public-facing deployment.
- French only, context limited to 2048 tokens.
- Math, code, legal: deliberately excluded from SFT; expect weak performance.
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "rodin-llm/rodin-1b-instruct"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda()
messages = [{"role": "user", "content": "Explique-moi en deux phrases ce qu'est la photosynthèse."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").cuda()
out = model.generate(inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))
Ollama (GGUF)
ollama run rodin-1b
Training procedure (summary)
- Base pretraining: 478,000 steps, ~32B French tokens, rented spot B200 (bfloat16). See base model.
- SFT: full fine-tune on 5,000 ChatML examples, single RTX 3090, loss masked on assistant responses only.
- Export: custom
RodinLM→ HuggingFaceLlamaForCausalLM→ GGUF (F16, Q8_0, Q4_K_M).
License
Released under the Apache 2.0 license.
Acknowledgements & transparency
Carried out by one person, with AI assistance openly acknowledged throughout. Thanks to EleutherAI (evaluation), the HPLT and Pleias teams (data), Wikimedia, and the llama.cpp / Ollama / LM Studio projects.
@misc{rodin1binstruct2026,
title = {RODIN-1B-Instruct: A French Conversational Model Trained From Scratch},
author = {RODIN},
year = {2026},
howpublished = {\url{https://huggingface.co/rodin-llm/rodin-1b-instruct}}
}
🇫🇷 Français
Présentation
RODIN-1B-Instruct est la variante conversationnelle (instruction-tunée) de rodin-1b — un modèle de langage causal uniquement francophone, de 1,24 milliard de paramètres, entraîné de zéro par une seule personne. RODIN signifie Research Open Deep Intelligence Natively-french.
Il a été produit par fine-tuning supervisé complet (SFT) du modèle de base sur des exemples ChatML français, et parle un français fluide dans un registre conversationnel. Des quantisations GGUF (F16, Q8_0, Q4_K_M) sont fournies pour llama.cpp, Ollama et LM Studio, y compris sur matériel CPU uniquement.
Pourquoi ce modèle existe
L'objectif n'a jamais été de rivaliser en score brut avec les gros modèles français bien financés. Pour situer : des projets français open source comparables ont été entraînés sur 3 000 milliards de tokens avec des centaines de GPU H100 sur des supercalculateurs nationaux. RODIN a été entraîné sur 32 milliards de tokens, par une seule personne, sur une instance B200 spot louée et une seule RTX 3090 pour l'itération locale et le SFT.
La valeur de RODIN est pédagogique et démonstrative : il montre ce qu'une personne motivée peut construire de zéro, de bout en bout, avec un petit budget — documenté honnêtement, limites comprises.
Format de chat
RODIN-1B-Instruct utilise le format ChatML :
<|im_start|>user
{message}<|im_end|>
<|im_start|>assistant
{réponse}<|im_end|>
Token d'arrêt : <|im_end|>. Le chat template est embarqué dans tokenizer_config.json, donc apply_chat_template et les runtimes comme Ollama/LM Studio le gèrent automatiquement.
Description du modèle
| Propriété | Valeur |
|---|---|
| Paramètres | 1,238 milliard |
| Architecture | Style LLaMA (RoPE, RMSNorm, SwiGLU, attention causale SDPA) |
| Dimension / Couches / Têtes | 2048 / 22 / 16 |
| Dimension FFN | 5461 |
| Vocabulaire | 64 000 (BPE SentencePiece maison, français) |
| Longueur de contexte | 2048 tokens |
| Format de chat | ChatML |
| Précision d'entraînement | bfloat16 |
Données de SFT
Full fine-tune sur 5 000 exemples ChatML générés localement avec un modèle Qwen, couvrant des registres français variés (narratif, résumé, réécriture, factuel simple, etc.) plus un petit jeu de politesse. Les maths et le juridique ont été délibérément exclus — fine-tuner un petit modèle sur du contenu qu'il rate ne ferait que lui apprendre à halluciner avec autorité.
Les données de pré-entraînement sous-jacentes sont documentées sur la carte du modèle de base.
Évaluation — FrenchBench (base vs instruct)
Évalué avec lm-evaluation-harness d'EleutherAI, french_bench, 3-shot, jeux de test complets, instruct évalué avec --apply_chat_template (OrangeSum exclu pour incompatibilité datasets).
| Tâche | Métrique | Base | Instruct |
|---|---|---|---|
| Vocabulaire | acc | 0,773 | 0,756 |
| Grammaire | acc | 0,765 | 0,756 |
| Compréhension écrite | acc | 0,606 | 0,549 |
| BoolQA | acc | 0,573 | 0,562 |
| NLI thématique | acc_norm | 0,498 | 0,378 |
| HellaSwag | acc_norm | 0,424 | 0,429 |
| ARC challenge | acc | 0,220 | 0,257 |
| ARC challenge | acc_norm | 0,265 | 0,309 |
| XNLI | acc | 0,333 | 0,323 |
| Trivia | f1 | 0,245 | 0,142 |
| Trivia | is_included | 0,190 | 0,213 |
L'effet du SFT — une observation clé. Sur Trivia, exact/f1 baissent alors que is_included monte (0,19 → 0,21). Ce n'est pas une régression : le modèle de base répondait sèchement (« Paris ») ; après SFT, il enrobe ses réponses dans des phrases complètes (« La capitale est Paris. »), donc la bonne réponse reste présente (is_included ↑) mais ce n'est plus une correspondance exacte. Le SFT a rendu le modèle conversationnel, exactement comme prévu. À noter aussi : ARC challenge s'améliore avec le SFT (0,27 → 0,31 acc_norm), tandis que la compétence linguistique de base (grammaire, vocabulaire) est préservée.
Usage prévu & limites
Usage prévu. Conversation française, résumé, réécriture, questions-réponses factuelles simples, éducation et expérimentation. Un démonstrateur, pas un système de production.
Limites.
- Taille. À 1,24B paramètres, connaissances et raisonnement limités ; hallucine sur les faits précis. Ne vous y fiez pas pour l'exactitude factuelle.
- Aucun safety tuning. Le SFT ne contenait aucune donnée de refus ou de sécurité. Le modèle n'a pas été entraîné à refuser des requêtes nuisibles, biaisées ou inappropriées, et peut produire de tels contenus. Il n'est pas adapté à un déploiement public ou non supervisé.
- Français uniquement, contexte limité à 2048 tokens.
- Maths, code, juridique : délibérément exclus du SFT ; performances faibles attendues.
Utilisation
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "rodin-llm/rodin-1b-instruct"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda()
messages = [{"role": "user", "content": "Explique-moi en deux phrases ce qu'est la photosynthèse."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").cuda()
out = model.generate(inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))
Ollama (GGUF)
ollama run rodin-1b
Procédure d'entraînement (résumé)
- Pré-entraînement de base : 478 000 steps, ~32B tokens français, B200 spot louée (bfloat16). Voir le modèle de base.
- SFT : full fine-tune sur 5 000 exemples ChatML, une seule RTX 3090, loss masquée sur les réponses de l'assistant uniquement.
- Export :
RodinLMmaison →LlamaForCausalLMHuggingFace → GGUF (F16, Q8_0, Q4_K_M).
Licence
Publié sous licence Apache 2.0.
Remerciements & transparence
Mené par une seule personne, avec une assistance IA assumée et transparente tout du long. Merci à EleutherAI (évaluation), aux équipes HPLT et Pleias (données), à Wikimedia, et aux projets llama.cpp / Ollama / LM Studio.
@misc{rodin1binstruct2026,
title = {RODIN-1B-Instruct: A French Conversational Model Trained From Scratch},
author = {RODIN},
year = {2026},
howpublished = {\url{https://huggingface.co/rodin-llm/rodin-1b-instruct}}
}
- Downloads last month
- 35
Model tree for rodin-llm/rodin-1b-instruct
Datasets used to train rodin-llm/rodin-1b-instruct
PleIAs/common_corpus
statmt/cc100
Evaluation results
- Grammar (acc) on FrenchBenchself-reported0.756
- ARC challenge (acc_norm) on FrenchBenchself-reported0.309

