File size: 3,433 Bytes
afe6dbb
 
 
 
 
 
 
 
91a36ff
afe6dbb
0c0ee6f
 
 
 
 
 
 
dc1957c
0c0ee6f
 
 
 
 
 
 
 
4426483
5d12649
f6b86e1
 
5b2729f
5d12649
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4426483
f02fb2e
4426483
5d12649
1ff8857
5d12649
6931c16
 
 
5d12649
6931c16
 
 
 
5d12649
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: README
emoji: πŸ’»
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false
---
Model/Data associated with Paper:

<h1 align="center"> Training Software Engineering Agents and Verifiers with SWE-Gym </h1>

<p align="center">
  <a href="https://www.jiayipan.com/" style="text-decoration: none;">Jiayi Pan<sup>*,1</sup></a>, 
  <a href="https://xwang.dev/" style="text-decoration: none;">Xingyao Wang<sup>*,2</sup></a>,
  <a href="https://www.phontron.com/" style="text-decoration: none;">Graham Neubig<sup>3</sup></a>,
  <a href="https://www.cs.toronto.edu/~ndjaitly/" style="text-decoration: none;">Navdeep Jaitly<sup>4</sup></a>,
  <a href="https://blender.cs.illinois.edu/hengji.html" style="text-decoration: none;">Heng Ji<sup>2</sup></a>,
  <a href="https://www.alanesuhr.com/" style="text-decoration: none;">Alane Suhr<sup>^,1</sup></a>,
  <a href="https://dreasysnail.github.io/" style="text-decoration: none;">Yizhe Zhang<sup>^,4</sup></a>
</p>

<p align="center">
  <sup>1</sup>UC Berkeley, <sup>2</sup>UIUC, <sup>3</sup>CMU, <sup>4</sup>Apple </br>
  <sub><sup>*</sup>Equal contribution, <sup>^</sup>Equal supervision</sub>
</p>

<p align="center">
<a href="https://github.com/SWE-Gym/SWE-Gym">πŸ’» Code </a>
β€’
<a href="https://arxiv.org/abs/2412.21139">πŸ“ƒ Paper</a>
β€’
<a href="https://huggingface.co/SWE-Gym" >πŸ€— Data & Models</a>
</p>

We present **SWE-Gym**, the first environment for training real-world software engineering agents.
We use it to train strong LM agents that achieve state-of-the-art open results on SWE-Bench, with early, promising scaling characteristics as we increase training and inference-time compute.

<p align="center">
  <img src="https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/teaser.jpg" width="100%" alt="teaser">
</p>


---

Progress in agents for software engineering has been limited by the lack of training environments that both include rigorous verification for reinforcement learning and cover the expansive tasks encountered in real-world repository-level engineering.

We introduce SWE-Gym: An Open Environment for Training Software Engineering Agents & Verifiers.
Our baselines achieve new open SOTA - 32%/26% on SWE-Bench Verified/Lite, with promising scaling trends.

![SWE-Gym Scaling](https://github.com/SWE-Gym/SWE-Gym/raw/main/assets/images/scaling.jpg)
*SWE-Gym enables scalable improvements for software engineering agents at both training and inference time. Our current results is primarity bottlenecked by training and inference compute, rather than the size of our environment.*

## Reproducing Results

Please refer to our [Github Repo](https://github.com/SWE-Gym/SWE-Gym) for more details: See [docs/OpenHands.md](https://github.com/SWE-Gym/SWE-Gym/tree/main/docs/OpenHands.md) and [docs/MoatlessTools.md](https://github.com/SWE-Gym/SWE-Gym/tree/main/docs/MoatlessTools.md) for instructions on reproducing results with our training and inference-time results for OpenHands and MoatlessTools agents.

## πŸ“š Citation

```bibtex
@misc{pan2024trainingsoftwareengineeringagents,
      title={Training Software Engineering Agents and Verifiers with SWE-Gym}, 
      author={Jiayi Pan and Xingyao Wang and Graham Neubig and Navdeep Jaitly and Heng Ji and Alane Suhr and Yizhe Zhang},
      year={2024},
      eprint={2412.21139},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2412.21139}, 
}
```