Instructions to use AIML-TUDA/Olmo-3.1-7B-Think with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIML-TUDA/Olmo-3.1-7B-Think with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AIML-TUDA/Olmo-3.1-7B-Think")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIML-TUDA/Olmo-3.1-7B-Think")
model = AutoModelForCausalLM.from_pretrained("AIML-TUDA/Olmo-3.1-7B-Think")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AIML-TUDA/Olmo-3.1-7B-Think with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIML-TUDA/Olmo-3.1-7B-Think"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIML-TUDA/Olmo-3.1-7B-Think",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AIML-TUDA/Olmo-3.1-7B-Think

SGLang

How to use AIML-TUDA/Olmo-3.1-7B-Think with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AIML-TUDA/Olmo-3.1-7B-Think" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIML-TUDA/Olmo-3.1-7B-Think",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AIML-TUDA/Olmo-3.1-7B-Think" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIML-TUDA/Olmo-3.1-7B-Think",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AIML-TUDA/Olmo-3.1-7B-Think with Docker Model Runner:
```
docker model run hf.co/AIML-TUDA/Olmo-3.1-7B-Think
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

⚡ Olmo 3.1 7B Think

A drop-in upgrade for Olmo 3 7B Think — one extra epoch of RLVR, no recipe changes.
Stronger instruction following and safety, reasoning held steady. Also the compute-matched control for OlmoLogic.

📝 Blog • 💻 Training Code • 📊 Eval Code • 🧠 OlmoLogic 7B Think

TL;DR

Olmo 3.1 7B Think is a continued-RLVR extension of allenai/Olmo-3-7B-Think. We take the official one-epoch Olmo 3 7B Think checkpoint and train it for roughly one additional epoch (1,850 RLVR steps) on the original Olmo-3 RLVR mixture (allenai/Dolci-Think-RL-7B) — no recipe changes, no new data.

The result is a drop-in upgrade for downstream use: stronger instruction following, safety, and reasoning.

⚖️ Olmo 3.1 was also built as the compute-matched control for our OlmoLogic experiments — same total step budget, but without the SLR logic data. See the blog post for the full story.

📊 What changed vs. Olmo 3 7B Think

Benchmark (avg)	Olmo-3-7B-Think	Olmo 3.1 7B Think	Δ
Instruction Following	64.9	71.5	+6.6 🔥
Safety	70.7	74.5	+3.8
Reasoning	75.8	76.7	+0.9
SLR-Bench	15.1	15.7	+0.6
Logic	59.1	59.1	+0.0
Math	71.1	70.5	−0.5
Knowledge	49.2	48.7	−0.5
Coding	76.6	75.0	−1.6
Chat	52.1	41.6	−10.5

The trade-off: the main regression is on open-ended Chat (−10.5), a known cost of extensive RLVR optimization. Code (−1.6) and knowledge (−0.5) shift within noise. If you care more about reasoning and instruction following than open-ended chat, this is a clean upgrade.

All numbers come from a single reproducible OLMES pipeline.

⚙️ Training

Base model: allenai/Olmo-3-7B-Think
Algorithm: GRPO via Slurm-adapted open-instruct (DeepSpeed ZeRO-3)
Data: allenai/Dolci-Think-RL-7B (the original Olmo-3 RLVR mix, unchanged)
Added training: ~1 epoch / 1,850 steps (3,350 total, matching OlmoLogic)
Settings: default Olmo-3 RLVR config — β = 0, constant LR 1e-6, global batch 512 (64 prompts × 8 rollouts), vLLM temperature 1.0

🚀 Inference

vLLM

from vllm import LLM, SamplingParams

model_id = "LukasHug/Olmo-3.1-7B-Think"
llm = LLM(model=model_id)

sampling_params = SamplingParams(
    temperature=0.6,
    top_p=0.95,
    max_tokens=32768,
)

prompt = "Explain why the square root of 2 is irrational."
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LukasHug/Olmo-3.1-7B-Think"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

messages = [{"role": "user", "content": "Explain why the square root of 2 is irrational."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=32768, temperature=0.6, top_p=0.95)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

💡 This is a Think model with long chain-of-thought — allow a generous max_tokens (16k–32k) for hard tasks.

✅ Takeaways

Pure continued RLVR. Same recipe, same data, one more epoch — a clean upgrade path, not a new model family.
Instruction following and safety improve; reasoning holds. The cost is concentrated in open-ended chat.
A faithful control. Compute-matched to OlmoLogic, so the SLR ablation isolates the effect of logic data, not extra steps.

Model Details

Developed by: Artificial Intelligence and Machine Learning Lab, Technical University of Darmstadt (TU Darmstadt)
Model type: Transformer autoregressive LM with long chain-of-thought
Language: English
License: Apache 2.0
Base model: allenai/Olmo-3-7B-Think

Sources

Blog: https://huggingface.co/blog/LukasHug/olmo-logic
Training code: https://github.com/lukashelff/open-instruct-slurm
Eval code: https://github.com/lukashelff/olmes-slurm
Logic-tuned sibling: OlmoLogic 7B Think

Citation

This work is based on the following two papers. If you build on it, please cite:

For the SLR-Bench, please cite:

@inproceedings{helff2025slr,
  title     = {{SLR: Automated Synthesis for Scalable Logical Reasoning}},
  author    = {Helff, Lukas and Omar, Ahmad and Friedrich, Felix and W{\"u}st, Antonia
               and Shindo, Hikaru and Woydt, Tim and Mitchell, Rupert
               and Schramowski, Patrick and Stammer, Wolfgang and Kersting, Kristian},
  booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)},
  year      = {2026},
  url       = {https://openreview.net/forum?id=omMnuTTEn7}
}

For the Reward Hacking paper, please cite:

@inproceedings{helff2026llms,
  title     = {{LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking}},
  author    = {Lukas Helff and Quentin Delfosse and David Steinmann and Ruben H{\"a}rle
               and Hikaru Shindo and Patrick Schramowski and Wolfgang Stammer
               and Kristian Kersting and Felix Friedrich},
  booktitle = {ICLR 2026 Workshop on Logical Reasoning of Large Language Models},
  year      = {2026},
  url       = {https://openreview.net/forum?id=4B3WfRNqe3}
}

Acknowledgments

Supported by DFKI and the hessian.AI Innovation Lab (BMFTR grant 16IS22091), the hessian.AISC Service Center (BMBF grant 01IS22091), and CERTAIN, with further support from TAILOR (EU Horizon 2020, GA 952215), the Hessian LOEWE program, NHR4CES, the BMWK project SOOFI (13IPC040G), the Cluster of Excellence "Reasonable AI" (DFG, EXC-3057), DFG SPP 2422, the AlephAlpha Collaboration Lab 1141, and OpenAI Research Credits.