Instructions to use Spreadsheet-RL/Spreadsheet-RL-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Spreadsheet-RL/Spreadsheet-RL-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Spreadsheet-RL/Spreadsheet-RL-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Spreadsheet-RL/Spreadsheet-RL-4B")
model = AutoModelForCausalLM.from_pretrained("Spreadsheet-RL/Spreadsheet-RL-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Spreadsheet-RL/Spreadsheet-RL-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Spreadsheet-RL/Spreadsheet-RL-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Spreadsheet-RL/Spreadsheet-RL-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Spreadsheet-RL/Spreadsheet-RL-4B

SGLang

How to use Spreadsheet-RL/Spreadsheet-RL-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Spreadsheet-RL/Spreadsheet-RL-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Spreadsheet-RL/Spreadsheet-RL-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Spreadsheet-RL/Spreadsheet-RL-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Spreadsheet-RL/Spreadsheet-RL-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Spreadsheet-RL/Spreadsheet-RL-4B with Docker Model Runner:
```
docker model run hf.co/Spreadsheet-RL/Spreadsheet-RL-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Spreadsheet-RL-4B

Project Page | Paper | Dataset | Code

Spreadsheet-RL-4B is the RL-trained 4B spreadsheet agent checkpoint from Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning. It starts from Qwen/Qwen3-4B-Thinking-2507 and is post-trained with outcome-based reinforcement learning in Spreadsheet Gym, a multi-turn Microsoft Excel environment with spreadsheet-native tools, sandboxed code execution, and Excel-based recalculation rewards.

This checkpoint is intended to be used with the Spreadsheet-RL agent harness and tool environment. Loading it as a plain chat model can be useful for inspection, but it will not reproduce the paper results without Spreadsheet Gym, the tool set, and the reward/evaluation pipeline.

News

2026-05-23: Released the Spreadsheet-RL-4B model checkpoint on Hugging Face at Spreadsheet-RL/Spreadsheet-RL-4B.

Model Details

Field	Value
Base model	`Qwen/Qwen3-4B-Thinking-2507`
Training method	GRPO with outcome-based rewards
Environment	Spreadsheet Gym with Microsoft Excel 365, spreadsheet-native tools, SandboxFusion code execution, and async Excel recalculation/reward service
Training data	Spreadsheet-RL training split: 5,928 filtered ExcelForum tasks
Evaluation	SpreadsheetBench and Domain-Spreadsheet
License	Apache-2.0, following the base model license

Training Configuration

For full details, please see the paper. The released 4B run uses:

Hyperparameter	Value
Algorithm	GRPO; KL-regularized against a frozen reference model
Training steps	60
Prompt/response limits	4,096 / 27,648 tokens
Rollout sampling	temperature 0.6; top-p 0.95; top-k 20
Batching	64 prompts/step; 16 rollouts/prompt; 1,024 rollouts/step
Multi-turn caps	max assistant turns 20; max user turns 20; max tool-response length 8,192
Optimizer	AdamW; learning rate 1e-6; weight decay 0.01; betas (0.9, 0.999); grad clip 1.0
KL loss	low-var KL; coefficient 0.001
Actor update batching	mini-batch 32; dynamic batch sizing enabled
Hardware	1 node x 4 NVIDIA H100 GPUs
Training time	about 40 hours wall-clock for the 4B run

Results

Spreadsheet-RL improves the same 4B base model through spreadsheet-native interaction design, comprehensive tool access, and RL post-training.

Benchmark	Base	+ Native Harness	+ Full Tools	Spreadsheet-RL-4B
SpreadsheetBench Pass@1	12.0	15.6	19.3	23.4

On Domain-Spreadsheet, Spreadsheet-RL improves overall Pass@1 from 8.4 to 17.2 over 1,660 evaluation rollouts.

Usage

Install the standard Transformers stack and load the checkpoint:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Spreadsheet-RL/Spreadsheet-RL-4B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For task evaluation and agent rollouts, use the full Spreadsheet-RL codebase with the released dataset and Spreadsheet Gym:

hf download Spreadsheet-RL/Spreadsheet-RL --repo-type dataset --local-dir data
git clone https://github.com/Spreadsheet-RL/Spreadsheet-RL.git

The default training/evaluation harness is maintained in the code repository under configs/, scripts/, reward/, and verl/.

Citation

@misc{chi2026spreadsheetrl,
  title         = {Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning},
  author        = {Banghao Chi and Yining Xie and Mingyuan Wu and Jingcheng Yang and Jize Jiang and Zhaoheng Li and Shengyi Qian and Minjia Zhang and Klara Nahrstedt and Rui Hou and Xiangjun Fan and Hanchao Yu},
  year          = {2026},
  eprint        = {2605.22642},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  doi           = {10.48550/arXiv.2605.22642},
  url           = {https://arxiv.org/abs/2605.22642}
}

Downloads last month: 26

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Spreadsheet-RL/Spreadsheet-RL-4B

Base model

Qwen/Qwen3-4B-Thinking-2507

Finetuned

(235)

this model

Quantizations

1 model

Dataset used to train Spreadsheet-RL/Spreadsheet-RL-4B

Paper for Spreadsheet-RL/Spreadsheet-RL-4B

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Paper • 2605.22642 • Published 4 days ago • 33