Instructions to use AIML-TUDA/Olmo-3.1-7B-Think with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AIML-TUDA/Olmo-3.1-7B-Think with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AIML-TUDA/Olmo-3.1-7B-Think") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AIML-TUDA/Olmo-3.1-7B-Think") model = AutoModelForCausalLM.from_pretrained("AIML-TUDA/Olmo-3.1-7B-Think") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AIML-TUDA/Olmo-3.1-7B-Think with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AIML-TUDA/Olmo-3.1-7B-Think" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIML-TUDA/Olmo-3.1-7B-Think", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AIML-TUDA/Olmo-3.1-7B-Think
- SGLang
How to use AIML-TUDA/Olmo-3.1-7B-Think with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AIML-TUDA/Olmo-3.1-7B-Think" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIML-TUDA/Olmo-3.1-7B-Think", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AIML-TUDA/Olmo-3.1-7B-Think" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIML-TUDA/Olmo-3.1-7B-Think", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AIML-TUDA/Olmo-3.1-7B-Think with Docker Model Runner:
docker model run hf.co/AIML-TUDA/Olmo-3.1-7B-Think
⚡ Olmo 3.1 7B Think
A drop-in upgrade for Olmo 3 7B Think — one extra epoch of RLVR, no recipe changes.
Stronger instruction following and safety, reasoning held steady. Also the compute-matched control for OlmoLogic.
📝 Blog • 💻 Training Code • 📊 Eval Code • 🧠 OlmoLogic 7B Think
TL;DR
Olmo 3.1 7B Think is a continued-RLVR extension of allenai/Olmo-3-7B-Think. We take the official one-epoch Olmo 3 7B Think checkpoint and train it for roughly one additional epoch (1,850 RLVR steps) on the original Olmo-3 RLVR mixture (allenai/Dolci-Think-RL-7B) — no recipe changes, no new data.
The result is a drop-in upgrade for downstream use: stronger instruction following, safety, and reasoning.
⚖️ Olmo 3.1 was also built as the compute-matched control for our OlmoLogic experiments — same total step budget, but without the SLR logic data. See the blog post for the full story.
📊 What changed vs. Olmo 3 7B Think
| Benchmark (avg) | Olmo-3-7B-Think | Olmo 3.1 7B Think | Δ |
|---|---|---|---|
| Instruction Following | 64.9 | 71.5 | +6.6 🔥 |
| Safety | 70.7 | 74.5 | +3.8 |
| Reasoning | 75.8 | 76.7 | +0.9 |
| SLR-Bench | 15.1 | 15.7 | +0.6 |
| Logic | 59.1 | 59.1 | +0.0 |
| Math | 71.1 | 70.5 | −0.5 |
| Knowledge | 49.2 | 48.7 | −0.5 |
| Coding | 76.6 | 75.0 | −1.6 |
| Chat | 52.1 | 41.6 | −10.5 |
The trade-off: the main regression is on open-ended Chat (−10.5), a known cost of extensive RLVR optimization. Code (−1.6) and knowledge (−0.5) shift within noise. If you care more about reasoning and instruction following than open-ended chat, this is a clean upgrade.
All numbers come from a single reproducible OLMES pipeline.
⚙️ Training
- Base model:
allenai/Olmo-3-7B-Think - Algorithm: GRPO via Slurm-adapted
open-instruct(DeepSpeed ZeRO-3) - Data:
allenai/Dolci-Think-RL-7B(the original Olmo-3 RLVR mix, unchanged) - Added training: ~1 epoch / 1,850 steps (3,350 total, matching OlmoLogic)
- Settings: default Olmo-3 RLVR config — β = 0, constant LR 1e-6, global batch 512 (64 prompts × 8 rollouts), vLLM temperature 1.0
🚀 Inference
vLLM
from vllm import LLM, SamplingParams
model_id = "LukasHug/Olmo-3.1-7B-Think"
llm = LLM(model=model_id)
sampling_params = SamplingParams(
temperature=0.6,
top_p=0.95,
max_tokens=32768,
)
prompt = "Explain why the square root of 2 is irrational."
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "LukasHug/Olmo-3.1-7B-Think"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
messages = [{"role": "user", "content": "Explain why the square root of 2 is irrational."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=32768, temperature=0.6, top_p=0.95)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
💡 This is a Think model with long chain-of-thought — allow a generous
max_tokens(16k–32k) for hard tasks.
✅ Takeaways
- Pure continued RLVR. Same recipe, same data, one more epoch — a clean upgrade path, not a new model family.
- Instruction following and safety improve; reasoning holds. The cost is concentrated in open-ended chat.
- A faithful control. Compute-matched to OlmoLogic, so the SLR ablation isolates the effect of logic data, not extra steps.
Model Details
- Developed by: Artificial Intelligence and Machine Learning Lab, Technical University of Darmstadt (TU Darmstadt)
- Model type: Transformer autoregressive LM with long chain-of-thought
- Language: English
- License: Apache 2.0
- Base model:
allenai/Olmo-3-7B-Think
Sources
- Blog: https://huggingface.co/blog/LukasHug/olmo-logic
- Training code: https://github.com/lukashelff/open-instruct-slurm
- Eval code: https://github.com/lukashelff/olmes-slurm
- Logic-tuned sibling: OlmoLogic 7B Think
Citation
This work is based on the following two papers. If you build on it, please cite:
For the SLR-Bench, please cite:
@inproceedings{helff2025slr,
title = {{SLR: Automated Synthesis for Scalable Logical Reasoning}},
author = {Helff, Lukas and Omar, Ahmad and Friedrich, Felix and W{\"u}st, Antonia
and Shindo, Hikaru and Woydt, Tim and Mitchell, Rupert
and Schramowski, Patrick and Stammer, Wolfgang and Kersting, Kristian},
booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)},
year = {2026},
url = {https://openreview.net/forum?id=omMnuTTEn7}
}
For the Reward Hacking paper, please cite:
@inproceedings{helff2026llms,
title = {{LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking}},
author = {Lukas Helff and Quentin Delfosse and David Steinmann and Ruben H{\"a}rle
and Hikaru Shindo and Patrick Schramowski and Wolfgang Stammer
and Kristian Kersting and Felix Friedrich},
booktitle = {ICLR 2026 Workshop on Logical Reasoning of Large Language Models},
year = {2026},
url = {https://openreview.net/forum?id=4B3WfRNqe3}
}
Acknowledgments
Supported by DFKI and the hessian.AI Innovation Lab (BMFTR grant 16IS22091), the hessian.AISC Service Center (BMBF grant 01IS22091), and CERTAIN, with further support from TAILOR (EU Horizon 2020, GA 952215), the Hessian LOEWE program, NHR4CES, the BMWK project SOOFI (13IPC040G), the Cluster of Excellence "Reasonable AI" (DFG, EXC-3057), DFG SPP 2422, the AlephAlpha Collaboration Lab 1141, and OpenAI Research Credits.
- Downloads last month
- 30
Model tree for AIML-TUDA/Olmo-3.1-7B-Think
Base model
allenai/Olmo-3-1025-7B