Spreadsheet-RL-4B

Project Page | Paper | Dataset | Code

Spreadsheet-RL-4B is the RL-trained 4B spreadsheet agent checkpoint from Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning. It starts from Qwen/Qwen3-4B-Thinking-2507 and is post-trained with outcome-based reinforcement learning in Spreadsheet Gym, a multi-turn Microsoft Excel environment with spreadsheet-native tools, sandboxed code execution, and Excel-based recalculation rewards.

This checkpoint is intended to be used with the Spreadsheet-RL agent harness and tool environment. Loading it as a plain chat model can be useful for inspection, but it will not reproduce the paper results without Spreadsheet Gym, the tool set, and the reward/evaluation pipeline.

News

Model Details

Field Value
Base model Qwen/Qwen3-4B-Thinking-2507
Training method GRPO with outcome-based rewards
Environment Spreadsheet Gym with Microsoft Excel 365, spreadsheet-native tools, SandboxFusion code execution, and async Excel recalculation/reward service
Training data Spreadsheet-RL training split: 5,928 filtered ExcelForum tasks
Evaluation SpreadsheetBench and Domain-Spreadsheet
License Apache-2.0, following the base model license

Training Configuration

For full details, please see the paper. The released 4B run uses:

Hyperparameter Value
Algorithm GRPO; KL-regularized against a frozen reference model
Training steps 60
Prompt/response limits 4,096 / 27,648 tokens
Rollout sampling temperature 0.6; top-p 0.95; top-k 20
Batching 64 prompts/step; 16 rollouts/prompt; 1,024 rollouts/step
Multi-turn caps max assistant turns 20; max user turns 20; max tool-response length 8,192
Optimizer AdamW; learning rate 1e-6; weight decay 0.01; betas (0.9, 0.999); grad clip 1.0
KL loss low-var KL; coefficient 0.001
Actor update batching mini-batch 32; dynamic batch sizing enabled
Hardware 1 node x 4 NVIDIA H100 GPUs
Training time about 40 hours wall-clock for the 4B run

Results

Spreadsheet-RL improves the same 4B base model through spreadsheet-native interaction design, comprehensive tool access, and RL post-training.

Benchmark Base + Native Harness + Full Tools Spreadsheet-RL-4B
SpreadsheetBench Pass@1 12.0 15.6 19.3 23.4

On Domain-Spreadsheet, Spreadsheet-RL improves overall Pass@1 from 8.4 to 17.2 over 1,660 evaluation rollouts.

Usage

Install the standard Transformers stack and load the checkpoint:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Spreadsheet-RL/Spreadsheet-RL-4B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For task evaluation and agent rollouts, use the full Spreadsheet-RL codebase with the released dataset and Spreadsheet Gym:

hf download Spreadsheet-RL/Spreadsheet-RL --repo-type dataset --local-dir data
git clone https://github.com/Spreadsheet-RL/Spreadsheet-RL.git

The default training/evaluation harness is maintained in the code repository under configs/, scripts/, reward/, and verl/.

Citation

@misc{chi2026spreadsheetrl,
  title         = {Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning},
  author        = {Banghao Chi and Yining Xie and Mingyuan Wu and Jingcheng Yang and Jize Jiang and Zhaoheng Li and Shengyi Qian and Minjia Zhang and Klara Nahrstedt and Rui Hou and Xiangjun Fan and Hanchao Yu},
  year          = {2026},
  eprint        = {2605.22642},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  doi           = {10.48550/arXiv.2605.22642},
  url           = {https://arxiv.org/abs/2605.22642}
}
Downloads last month
26
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Spreadsheet-RL/Spreadsheet-RL-4B

Finetuned
(235)
this model
Quantizations
1 model

Dataset used to train Spreadsheet-RL/Spreadsheet-RL-4B

Paper for Spreadsheet-RL/Spreadsheet-RL-4B