Instructions to use Spreadsheet-RL/Spreadsheet-RL-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Spreadsheet-RL/Spreadsheet-RL-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Spreadsheet-RL/Spreadsheet-RL-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Spreadsheet-RL/Spreadsheet-RL-4B") model = AutoModelForCausalLM.from_pretrained("Spreadsheet-RL/Spreadsheet-RL-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Spreadsheet-RL/Spreadsheet-RL-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Spreadsheet-RL/Spreadsheet-RL-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Spreadsheet-RL/Spreadsheet-RL-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Spreadsheet-RL/Spreadsheet-RL-4B
- SGLang
How to use Spreadsheet-RL/Spreadsheet-RL-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Spreadsheet-RL/Spreadsheet-RL-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Spreadsheet-RL/Spreadsheet-RL-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Spreadsheet-RL/Spreadsheet-RL-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Spreadsheet-RL/Spreadsheet-RL-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Spreadsheet-RL/Spreadsheet-RL-4B with Docker Model Runner:
docker model run hf.co/Spreadsheet-RL/Spreadsheet-RL-4B
Spreadsheet-RL-4B
Project Page | Paper | Dataset | Code
Spreadsheet-RL-4B is the RL-trained 4B spreadsheet agent checkpoint from Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning. It starts from Qwen/Qwen3-4B-Thinking-2507 and is post-trained with outcome-based reinforcement learning in Spreadsheet Gym, a multi-turn Microsoft Excel environment with spreadsheet-native tools, sandboxed code execution, and Excel-based recalculation rewards.
This checkpoint is intended to be used with the Spreadsheet-RL agent harness and tool environment. Loading it as a plain chat model can be useful for inspection, but it will not reproduce the paper results without Spreadsheet Gym, the tool set, and the reward/evaluation pipeline.
News
- 2026-05-23: Released the Spreadsheet-RL-4B model checkpoint on Hugging Face at
Spreadsheet-RL/Spreadsheet-RL-4B.
Model Details
| Field | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Thinking-2507 |
| Training method | GRPO with outcome-based rewards |
| Environment | Spreadsheet Gym with Microsoft Excel 365, spreadsheet-native tools, SandboxFusion code execution, and async Excel recalculation/reward service |
| Training data | Spreadsheet-RL training split: 5,928 filtered ExcelForum tasks |
| Evaluation | SpreadsheetBench and Domain-Spreadsheet |
| License | Apache-2.0, following the base model license |
Training Configuration
For full details, please see the paper. The released 4B run uses:
| Hyperparameter | Value |
|---|---|
| Algorithm | GRPO; KL-regularized against a frozen reference model |
| Training steps | 60 |
| Prompt/response limits | 4,096 / 27,648 tokens |
| Rollout sampling | temperature 0.6; top-p 0.95; top-k 20 |
| Batching | 64 prompts/step; 16 rollouts/prompt; 1,024 rollouts/step |
| Multi-turn caps | max assistant turns 20; max user turns 20; max tool-response length 8,192 |
| Optimizer | AdamW; learning rate 1e-6; weight decay 0.01; betas (0.9, 0.999); grad clip 1.0 |
| KL loss | low-var KL; coefficient 0.001 |
| Actor update batching | mini-batch 32; dynamic batch sizing enabled |
| Hardware | 1 node x 4 NVIDIA H100 GPUs |
| Training time | about 40 hours wall-clock for the 4B run |
Results
Spreadsheet-RL improves the same 4B base model through spreadsheet-native interaction design, comprehensive tool access, and RL post-training.
| Benchmark | Base | + Native Harness | + Full Tools | Spreadsheet-RL-4B |
|---|---|---|---|---|
| SpreadsheetBench Pass@1 | 12.0 | 15.6 | 19.3 | 23.4 |
On Domain-Spreadsheet, Spreadsheet-RL improves overall Pass@1 from 8.4 to 17.2 over 1,660 evaluation rollouts.
Usage
Install the standard Transformers stack and load the checkpoint:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Spreadsheet-RL/Spreadsheet-RL-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
For task evaluation and agent rollouts, use the full Spreadsheet-RL codebase with the released dataset and Spreadsheet Gym:
hf download Spreadsheet-RL/Spreadsheet-RL --repo-type dataset --local-dir data
git clone https://github.com/Spreadsheet-RL/Spreadsheet-RL.git
The default training/evaluation harness is maintained in the code repository under configs/, scripts/, reward/, and verl/.
Citation
@misc{chi2026spreadsheetrl,
title = {Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning},
author = {Banghao Chi and Yining Xie and Mingyuan Wu and Jingcheng Yang and Jize Jiang and Zhaoheng Li and Shengyi Qian and Minjia Zhang and Klara Nahrstedt and Rui Hou and Xiangjun Fan and Hanchao Yu},
year = {2026},
eprint = {2605.22642},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
doi = {10.48550/arXiv.2605.22642},
url = {https://arxiv.org/abs/2605.22642}
}
- Downloads last month
- 26