TMax 8B

TMax 8B is a model trained using DPPO on top of Qwen 3 8B for use as a terminal-agent.

This model is part of a collection of terminal agents in various sizes.

The main branch is the step 300 checkpoint as that performed best on tblite.

Evaluation Results

Model	TB Lite	TB 2.1
Qwen 3 8B	7.3 +/- 1.0	1.1 +/- 0.9
Tmax SFT 8B	11.5 +/- 0.1	6.0 +/- 1.4
Tmax 8B	17.7 +/- 1.9	5.2 +/- 2.3

For details on evaluation methodology please check our paper. In general, we used a podman (docker) backend with default timeouts and custom harness similar to mini-swe-agent.

Model Details

Model Description

Developed by: Ai2
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model [optional]: Qwen 3 8B
Dataset: TMax-15k

Use

To use this model, we recommend serving with vllm (or your inference framework of choice) with:

uvx vllm==0.19.1 serve allenai/tmax-8b \
  --served-model-name tmax-8b \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_xml \
  --port 8008 \
  --max-model-len 40960 \
  --tensor-parallel-size 8 \
  --language_model_only

Make sure to set language_model_only as we removed the vision head during training.

For more details on evaluation, please see our codebase.

Hyperparameters

This model was trained using DPPO with the following hyperparameters:

base model: allenai/tmax-sft-8b
Dataset: tmax 15K
Max prompt tokens: 2048
Max per-turn tokens: 16384
Max overall tokens: 32768
Pack length: 34816
Per-device train batch size: 1
Unique prompts per rollout: 32
Samples per prompt rollout: 8
Async steps: 4
Max steps: 64
Learning rate: 1e-6
LR scheduler: constant
Total training steps: 500 steps
Sampling Temperature: 1.0
KL Beta: 0.0
Loss fn: DPPO
Divergence: binary TV
TV threshold: 0.1
Advantage normalization: centered (no division by stdev)
FP32 LM head: true

For more details on training, please see our codebase.

License

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

Citation

If you use our model or data, please cite our paper:

@misc{ivison2026tmaxsimplerecipeterminal,
      title={Tmax: A simple recipe for terminal agents}, 
      author={Hamish Ivison and Junjie Oscar Yin and Rulin Shao and Teng Xiao and Nathan Lambert and Hannaneh Hajishirzi},
      year={2026},
      eprint={2606.23321},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.23321}, 
}