arxiv:2606.14061

LLM Agents Can See Code Repositories

Published on Jun 12

· Submitted by

Silin Chen on Jun 15

Shanghai Jiao Tong University

Upvote

Authors:

Silin Chen ,

Abstract

Visual repository representations enhance LLM-based coding agents by improving structural understanding and reducing token consumption during issue resolution.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Coding agents powered by large language models have demonstrated strong performance on software engineering tasks. Yet most agents consume repositories almost entirely as text, which differs from how human developers use visual structure such as folder hierarchies and dependency relationships to orient themselves in large codebases. With multimodal large language models (MLLMs), it is an open question whether agents can effectively benefit from visual representations of repositories. This paper presents the first systematic empirical study of visual repository representations for LLM-based agents on repository-level issue resolution. We evaluate four recent multimodal models. Our results show that a strictly vision-only setup degrades accuracy and increases token cost, because agents lack sufficient symbolic detail and compensate with repeated visual queries. In contrast, integrating visual graphs of repository structure as a supplementary modality alongside standard text interfaces helps agents understand structure more efficiently: input token consumption decreases by up to 26% while issue-resolution accuracy is maintained or improved. Visualization is most useful during fault localization and when the agent autonomously controls exploration depth. These findings point to a practical hybrid text-and-vision design for next-generation coding agents.

View arXiv page View PDF GitHub 9 Add to collection

Community

Silin-Chen

Paper author Paper submitter about 20 hours ago

SeeRepo lets a coding agent look at a repo more like a developer would: not just reading files one by one, but seeing how files, classes, functions, and dependencies connect. It gives the agent a visual map alongside the code, so it can find the right place to inspect or fix much faster. If humans can see structure to understand a codebase, why shouldn’t agents see it too?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.14061

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.14061 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.14061 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.14061 in a Space README.md to link it from this page.