Awesome Loop Engineering

Awesome Loop Engineering cover

A curated, implementation-oriented list of resources for Loop Engineering: the layer above prompt, context, and harness engineering for designing recurring AI-agent systems.

Prompt engineering improves what you ask the model. Context engineering improves what the model can see. Harness engineering improves the tools, permissions, sandboxes, and checks around one agent run. Loop Engineering sits above all three: it is the emerging AI and coding-agent practice of moving from manually prompting agents turn by turn to designing loops that do the prompting, supervision, verification, state updates, and re-triggering for you.

A loop discovers work, hands it to one or more agents, checks the result, records state, decides what should happen next, and runs again on a cadence or until a verifiable goal is reached.

This repository is about the new AI-agent meaning of Loop Engineering. It is not about software event loops, control theory, growth loops, generic workflow automation, or non-AI feedback systems.

Quick orientation for first-time visitors:

What it is: Loop Engineering is the practice of designing recurring AI-agent and coding-agent systems—how work is discovered, delegated, verified, retried, and escalated over time, not just for a single run.
Why it matters now: As coding agents move from one-off prompts to background automation, the design challenge shifts from "what do I ask?" to "how does the system keep working reliably?" This list exists because no existing collection focused on that layer.
Who this is for: builders of AI agents, coding agents, and orchestration systems; reliability and eval engineers; teams adding recurring agent loops to production infrastructure.
Where to start: Canonical Definition, Loop Contract, Start Here, then Pattern Library.

Why This Repo Exists
Mental Model
How To Use This List
Reading Paths
Choose Your Loop
Canonical Definition
Concept Guides
Maintainer Picks
Repository Highlights
Resource Type Legend
Start Here
Scope Boundary
The Loop Contract
Loop Design Checklist
Loop Maturity Model
Core Loop Primitives
Official Runtime Guides
Research Foundations
Agent Workflow Patterns
Coding-Agent Loop Systems
Verification And Feedback Gates
Securing Unattended Loops
State, Memory, And Context Persistence
Orchestration And Multi-Agent Delegation
Benchmarks And Evaluation
Operations Playbooks
Templates And Patterns
Examples And Schema
Community Gallery
Discovery And Distribution
Roadmap And Discussion
Pattern Library
Critiques, Risks, And Limitations
Adjacent Awesome Lists
Citation

Why This Repo Exists

Loop Engineering is becoming a distinct craft because the leverage point is moving from better single prompts, richer context, and stronger harnesses to recurring systems that decide when and how agents should run. The best agent workflows now combine goals, state, work isolation, tool permissions, feedback gates, retries, escalation, and receipts. This list exists to make that craft easier to learn, compare, and practice without mixing it with unrelated loop concepts or generic AI-agent hype.

Mental Model

Prompt engineering asks: what should I say to the model?

Context engineering asks: what state and knowledge should the model see?

Harness engineering asks: what tools, permissions, tests, sandboxes, and feedback should surround the agent?

Loop engineering asks: what recurring system should discover work, delegate to agents, verify results, persist state, decide next actions, and re-run when the human is no longer in the inner loop?

Prompt, context, and harness engineering make one agent run better. Loop Engineering makes agent work repeatable, observable, and governable over time.

Loop Engineering stack

Loop shape:

Objective
  -> Trigger / cadence
  -> Discover / intake work
  -> Delegate to agents
  -> Act in an isolated workspace
  -> Verify with tests, evals, traces, or reviewers
       -> if failed: feed back the evidence and retry
       -> if passed: persist state and decide what happens next
  -> Repeat, report, open a PR, or escalate to a human

Loop Engineering lifecycle: Intake, Delegate, Act, Verify, Persist, Decide; Decide retries by feeding evidence back, escalates to a human, or exits when the goal is met

How To Use This List

Start with the first-read resources and the Loop Contract if the term is new. For implementation work, move through core primitives, runtime guides, templates, and patterns. For reliability work, focus on verification gates, state persistence, critiques, and limitations. Contributions should prefer primary sources, official docs, papers, and implementation-heavy write-ups.

Reading Paths

Choose a path based on your intent.

Learn the concept: canonical definition, mental model, comparison guide, and the Loop Contract.
Implement a loop: core primitives, official runtime guides, the pattern library, and examples.
Improve reliability or evals: verification gates, benchmarks, critiques, and limitations.
Contribute: the community gallery, templates, and contribution guide.

Choose Your Loop

Start from the problem you have, not the pattern you want. Find the pattern name below, then open its full write-up in the Pattern Library section, or compare every pattern in the pattern matrix, which also links each one by symptom.

When you say...	Reach for the loop
"My PR is stuck"	PR babysitter
"CI keeps failing"	CI repair loop
"The docs may be stale"	Docs drift collector
"A deploy needs monitoring"	Deploy verifier
"Feedback is noisy"	Feedback clusterer
"Dependency updates pile up"	Dependency triage loop
"Agent evals regressed"	Evaluation regression loop
"Sensitive changes need review"	Security review loop
"Agent spend is rising"	Cost-control loop
"I need recurring bug discovery"	Bug hunting loop
"A change needs sign-off"	Enterprise approval loop
"An incident just paged"	Incident response loop
"A dataset keeps drifting"	Data-quality loop
"Release notes are a chore"	Release-note loop
"Model choice is ad hoc"	Model-routing loop

Not sure which runtime should run it? See the runtime selection guide.

Canonical Definition

Loop Engineering is the AI and coding-agent practice of designing recurring systems that discover work, delegate it to agents, verify results, persist state, decide next actions, and run again on a cadence, event, or until a verifiable goal is reached.

Concept Guides

These repository-native guides define the concept, boundaries, and practical artifacts without relying on vendor-specific terminology.

🧾 Template Canonical Definition - Short definition, positioning, minimal loop test, and citation note.
🧾 Template Loop Engineering Manifesto - Concise statement of the concept, commitments, non-goals, and success standard.
🧾 Template Loop Engineering Taxonomy - Classification by trigger, intake, verification, state model, topology, and operating domain.
⚠️ Critique Loop Engineering Anti-Patterns - Common failure modes such as prompt loops with no contract, infinite retries, model self-approval, hidden state, and unsafe autonomy.
🧾 Template Comparison Guide - Distinguishes Loop Engineering from prompt engineering, context engineering, harness engineering, workflow automation, agent workflows, and evaluation loops.
🧾 Template Sourced Signals And Quotes - Short sourced signals from linked public materials that anchor the emerging concept.
🧾 Template Outreach Kit - Conservative messages for inviting corrections, sources, and real-world loop patterns.

Maintainer Picks

🧾 Template Maintainer Picks - Shortlist of concept, practice, reliability, and reusable artifact resources.

Repository Highlights

Beyond the curated list, this repository ships its own artifacts: an operational pattern library, a schema-validated loop contract for every pattern, a runnable reference loop, a community gallery, eight language entry points, a standalone landing page, and an active discussion thread for real or anonymized Loop Engineering patterns.

Resource Type Legend

📄 Paper: academic paper, preprint, or technical report.
📝 Blog: essay, field note, article, or practitioner write-up.
📚 Docs: official product, API, SDK, or platform documentation.
🧰 Tool: repository, framework, SDK, runtime, or implementation.
🧪 Benchmark: benchmark, eval suite, leaderboard, or evaluation dataset.
🔁 Pattern: real-world loop pattern, operational playbook, or reusable workflow.
🧾 Template: template, checklist, schema, repository guide, or contribution artifact.
🧭 List: adjacent awesome list, ecosystem map, or curated collection.
⚠️ Critique: risk analysis, limitation, caveat, or skeptical take.

Start Here

Direct resources about the new AI/coding-agent meaning of Loop Engineering.

📝 Blog Loop Engineering - Addy Osmani's framing of loop engineering as the layer above manually prompting coding agents, with concrete primitives across Codex and Claude Code.
📝 Blog Loop Engineering - Substack version of the same essay; useful for the original discussion trail and quotations from Peter Steinberger and Boris Cherny.
📝 Blog Loop Engineering - Concise explanation of the shift from prompting agents to designing loops that discover work, delegate, verify, persist, and continue.
📝 Blog Loop Engineering: The Guide for AI Agents - Practical guide that breaks the pattern into automations, worktrees, skills, connectors, subagents, and state.
🔁 Pattern Codex Loops: What Boris Cherny Gets Right About Managing Agent Work - Engineering note on recurring agent loops for PR babysitting, CI repair, deploy verification, and feedback clustering.
📝 Blog I Now Just Write Loops To Prompt Claude Code: Claude Code Creator Boris Cherny - Coverage of Boris Cherny's "my job is to write loops" workflow.
📝 Blog My Lord! AI Programming Undergoes Another Major Shift - Broad coverage of the Boris Cherny and Peter Steinberger discussion, including the distinction between cold-start scripts and persistent agent loops.
📝 Blog Peter Steinberger on designing loops - The June 2026 post - "you shouldn't be prompting coding agents anymore, you should be designing loops that prompt your agents" - that catalyzed the current discussion.
📝 Blog The Anthropic leader who built Claude Code ditched prompting - now he writes loops - The New Stack's report on Boris Cherny's shift from prompting to loop writing and what it changes about developer workflow.
📝 Blog Stop Prompting. Design the Loop. - Practical breakdown of loop building blocks - automations, worktrees, skills, connectors, subagents - plus external memory and verification through oracles such as tests and builds.
📝 Blog Boris Cherny: five tips for running Opus autonomously for hours or days - The Claude Code creator's compact loop recipe: auto-mode permissions, dynamic workflows, /goal or /loop, the cloud runner, and end-to-end self-verification.

Scope Boundary

In scope	Out of scope
AI/coding-agent loops that coordinate prompts, context, harnesses, verification, and state over repeated agent runs	Software event loops, UI/game loops, or control theory loops
Scheduled, goal-driven, or event-triggered agent work	Generic cron jobs with no agentic reasoning or verification
Agent loops with durable state, worktrees, checkpoints, traces, or progress files	One-off prompt examples with no loop, state, or feedback signal
Verification loops using tests, CI, evals, reviewers, or deterministic gates	Pure AI news, generic product pages, or marketing copy
Multi-agent maker/checker/delegation patterns	Broad agent lists without specific loop-design relevance

The Loop Contract

A useful loop has a contract. If one of these is missing, the loop usually becomes either a manual prompt habit or an unsafe background automation. Prompt, context, and harness choices are ingredients; the loop contract is the operating layer that connects them over time.

Loop Contract cards

Part	Design question	Common artifact
Objective	What should the loop optimize for?	Goal, issue, PRD, runbook
Trigger	When does the loop run?	Schedule, webhook, `/loop`, `/goal`, automation
Discover / Intake	How does the loop find work?	GitHub queries, Linear filters, CI failures, feedback stream
Workspace	Where can the agent act safely?	Worktree, sandbox, branch, container
Context	What durable knowledge should it load?	`AGENTS.md`, `CLAUDE.md`, `SKILL.md`, docs
Delegation	Which agent does which job?	Explorer, implementer, reviewer, judge
Verification	What says "yes" or "no"?	Tests, typecheck, lint, evals, trace graders
State	What survives the next run?	Progress file, database checkpoint, trace, issue comment
Budget	When should it stop spending?	Max turns, max retries, token budget, time box
Escalation	When does a human take over?	PR, issue, Slack alert, triage inbox
Exit	How does the loop know it is done?	Acceptance criteria, passing checks, no work found

Good loop documentation should make the contract visible. A reader should be able to tell what triggers the loop, what state it reads, what it is allowed to change, how it verifies progress, and when it stops.

Loop Design Checklist

Check	Question
Name one objective	Does the loop optimize for a specific outcome instead of a vague goal such as "improve the repo"?
Define the intake	Where does work enter: PR comments, CI failures, issues, logs, eval failures, feedback, or schedule?
Isolate execution	Does the agent act in a worktree, sandbox, branch, container, or read-only mode?
Write the feedback signal first	Do tests, typechecks, lint, evals, policy checks, or trace graders exist before retries begin?
Persist state outside the model	Does progress survive in files, issue comments, checkpoints, traces, or a database?
Separate maker and checker	Does something other than the acting agent decide whether the work is done?
Put a budget on autonomy	Are runtime, turns, retries, token spend, and concurrent workers capped?
Design escalation	Is it clear when the loop should open a PR, file an issue, ask a human, or stop?
Keep receipts	Are commands, evidence, changed files, and stop reasons recorded?

Loop Maturity Model

Level	Name	Description
0	Manual prompting	A human reads state and writes the next prompt.
1	Scripted retry	A shell/script loop feeds errors back to an agent.
2	Scheduled loop	The agent runs on a cadence and reports findings.
3	Stateful loop	Progress survives across sessions through files, issues, checkpoints, or traces.
4	Self-verifying loop	Deterministic checks or evaluator agents gate completion.
5	Multi-agent loop	Specialized agents split discovery, implementation, review, and judgment.
6	Production-supervised loop	Observability, budgets, approvals, rollback, and human escalation are first-class.

Most teams should climb this model slowly. A reliable Level 3 loop with clear state and deterministic checks is usually more valuable than a flashy Level 5 loop with vague goals.

Core Loop Primitives

These are the building blocks that make a loop more than a repeated prompt.

📚 Docs Automations - Codex app - Codex background automations for recurring tasks, triage inboxes, skills, and worktree isolation.
📚 Docs Follow a goal - Codex use cases - Official guidance for durable objectives with stopping conditions, validation commands, checkpoints, and progress logs.
📚 Docs Worktrees - Codex app - Codex worktree model for isolated parallel tasks and handoffs between local and background workspaces.
📚 Docs Prompting - Codex - Explains the Codex loop, threads, context, and /goal mode.
📚 Docs Customization - Codex - Maps AGENTS.md, memories, skills, MCP, and subagents into a coherent customization stack.
📚 Docs Agent Skills - Codex - Official skill format for reusable workflows, scripts, MCP dependencies, invocation policy, and plugin packaging.
📚 Docs Plugins - Codex - Bundles skills, app integrations, and MCP servers into reusable loop capabilities.
📚 Docs Slash commands in Codex CLI - CLI commands for switching agent threads, browsing skills, inspecting MCP tools, and using subagent workflows.
🔁 Pattern Autonomous Loops - Claude Code pattern using task files, stop hooks, restart behavior, hard limits, and a kill switch.
📚 Docs Claude Code Glossary - Defines the agentic loop, hooks, subagents, skills, MCP, and related primitives in Claude Code terminology.
📚 Docs Keep Claude working toward a goal - /goal runs turn after turn until a completion condition is met by a verifier.
📚 Docs Run prompts on a schedule - /loop, scheduled tasks, reminders, monitor tools, and session-scoped recurring prompts.
📚 Docs Automate work with routines - Claude Code routines: persistent cloud automations triggered by schedules, API calls, or GitHub events, with connectors, scoped environments, and branch-push limits.
📚 Docs Desktop scheduled tasks - Local recurring runs on your own machine, with the persistence, file-access, permission, worktree, and missed-run trade-offs that distinguish them from /loop and cloud routines.
📚 Docs Run parallel sessions with worktrees - Worktree isolation for parallel sessions and subagents so concurrent edits do not collide.
📚 Docs Automate actions with hooks - Claude Code hooks guide for deterministic lifecycle control around model actions.
📚 Docs Hooks reference - Event-level reference for session, turn, tool-call, and subagent hooks.
📚 Docs Common workflows - Claude Code - Practical workflows for worktrees, subagents, CI, batch processing, planning, and resuming prior work.
📚 Docs Manage multiple agents with agent view - Dashboard for dispatching, monitoring, and attaching to background agent sessions.
📚 Docs Run agents in parallel - Compares agent view, subagents, agent teams, worktrees, tasks, and workflows for parallel work.
📚 Docs Orchestrate subagents at scale with dynamic workflows - Moves loop state and branching into workflow scripts so large tasks do not overload the conversation context.
📚 Docs Create plugins - Packaging model-invoked skills, agents, hooks, MCP servers, monitors, and settings as shareable loop components.
📚 Docs Model Context Protocol - Standard protocol for exposing tools and data sources to agent loops.
📚 Docs Allowing GitHub Copilot CLI to work autonomously - Copilot CLI autopilot mode plus /every and /after scheduling, turning the CLI into an unattended loop that runs steps until a task is complete.

Official Runtime Guides

Primary-source docs from agent runtime vendors and framework builders.

📚 Docs Run long horizon tasks with Codex - OpenAI's runbook for plan-edit-test-observe-repair-document-repeat work, including specs, plans, status logs, and validation gates.
📚 Docs Best practices - Codex - Official best practices for context, AGENTS.md, MCP, skills, subagents, and automations.
📚 Docs Agents SDK - OpenAI guide for agent orchestration, tool execution, approvals, state, guardrails, and observability.
📚 Docs Agents - OpenAI Agents SDK - SDK primitives for agents, tools, handoffs, guardrails, and runner-managed loops.
📚 Docs Running agents - OpenAI guide to turns, state, approvals, sessions, and continuation in the SDK runtime loop.
📚 Docs Integrations and observability - OpenAI guide to MCP wiring and traces as the basis for debugging and evaluation loops.
📚 Docs Sandbox Agents - Splits the harness control plane from the sandbox execution plane for long-running file and command work.
📚 Docs Guardrails and human review - Approval and validation boundaries for sensitive agent actions.
📚 Docs Building agents with the Claude Agent SDK - Claude SDK overview for tool-using agents, subagents, state, permissions, and streaming.
📚 Docs How the agent loop works - Official walkthrough of the inner agent loop that outer recurring loops build on.
📚 Docs Extend Claude with skills - Claude Code skill system for reusable loop instructions and assets.
📚 Docs Create custom subagents - Claude Code custom subagents with isolated context, model choice, and tool permissions.
📚 Docs GitHub Agentic Workflows - Repository automation that runs coding agents in GitHub Actions on events or schedules with guardrails.
📝 Blog GitHub Agentic Workflows technical preview - Changelog announcement for Markdown-defined agentic workflows in GitHub Actions.
📚 Docs Continuous AI - GitHub Next's umbrella framing for CI/CD-style AI automation across the software lifecycle, the category that agentic workflows demonstrate.
📝 Blog Automate repository tasks with GitHub Agentic Workflows - Official walkthrough of writing Markdown-defined agentic workflows with guardrails for triage, QA, and docs chores.
📝 Blog Continuous AI in practice: What developers can automate today with agentic CI - Concrete agentic-CI automations available today, with recurring patterns for triage, review, and documentation upkeep.
📚 Docs About GitHub Copilot coding agent - GitHub's autonomous coding agent: assign an issue, the agent works in an isolated Actions-powered workspace, and a reviewable pull request comes back.
📝 Blog GitHub Copilot: Meet the new coding agent - Launch overview of the issue-to-PR delegation loop, including iteration on review feedback.
📚 Docs Jules - Google's asynchronous coding agent that plans, executes tasks in isolated cloud VMs, and returns reviewable diffs.
📚 Docs Cursor cloud agents - Remote agents that work asynchronously in isolated environments and hand results back for review.
📚 Docs Devin Docs - Documentation for a long-running autonomous software engineer with sessions, playbooks, knowledge, and review boundaries.
📚 Docs Writing effective tools for AI agents - Anthropic's guidance on evaluating and improving tool specs using agentic loops and realistic tasks.
📚 Docs Introducing advanced tool use on the Claude Developer Platform - Tool search, programmatic tool calling, and tool-use examples for scaling large tool libraries without flooding context.
📚 Docs Effective harnesses for long-running agents - Anthropic's guidance for agents that work across many context windows: durable progress artifacts, environment setup, and self-verification.
📚 Docs Claude Code best practices - Widely cited workflow guidance that underlies many recurring Claude Code loops.

Research Foundations

Loop Engineering is new as a practice name, but it builds on years of agent-loop, feedback, planning, and self-correction research.

📄 Paper ReAct: Synergizing Reasoning and Acting in Language Models - Foundational reason-act-observe loop for tool-using language agents.
📄 Paper Reflexion: Language Agents with Verbal Reinforcement Learning - Converts environment feedback into written reflections stored in memory for future attempts.
📄 Paper Self-Refine: Iterative Refinement with Self-Feedback - Generate-feedback-refine loop where a model improves outputs over repeated passes.
📄 Paper CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing - Uses tools to ground critique and correction rather than relying only on introspection.
📄 Paper Tree of Thoughts - Search over multiple reasoning branches; relevant when loop design needs exploration before committing.
📄 Paper Graph of Thoughts - Generalizes thought structures beyond chains and trees, useful for complex loop planning and aggregation.
📄 Paper Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models - Combines search, action, and environment feedback for language agents.
📄 Paper Voyager: An Open-Ended Embodied Agent with Large Language Models - Demonstrates lifelong skill acquisition through iterative exploration, feedback, and a skill library.
📄 Paper Generative Agents: Interactive Simulacra of Human Behavior - Introduces reflection and memory mechanisms for long-running agent behavior.
📄 Paper Measuring AI Ability to Complete Long Software Tasks - METR's task-length time horizon metric; grounds why loop budgets, checkpoints, and escalation matter as autonomous work gets longer.
📝 Blog Measuring AI Ability to Complete Long Tasks - Accessible summary of the 50% task-completion time horizon and its doubling trend.
📄 Paper Reflection-Driven Control for Trustworthy Code Agents - Elevates reflection from an external pass to an internal control loop that monitors the agent's decision path during generation and constrains risky steps with low overhead.
📄 Paper PARC: An Autonomous Self-Reflective Coding Agent for Robust Execution of Long-Horizon Tasks - Hierarchical plan-execute-assess loops that detect and correct strategic errors during multi-hour autonomous runs.
📄 Paper When the Specification Emerges: Benchmarking Faithfulness Loss in Long-Horizon Coding Agents - Measures how agents drift from intent when specifications arrive incrementally across a long loop, and proposes a mitigation that recovers most of the loss.
🧰 Tool Reflexion code - Reference implementation and experiments for verbal reinforcement loops.

Agent Workflow Patterns

These resources are included when they help design the higher-level loop around agents, not merely because they describe agents in general.

📚 Docs Building Effective Agents - Anthropic's canonical guide to workflows and agents, including evaluator-optimizer and orchestrator-workers patterns.
📝 Blog How we built our multi-agent research system - Detailed orchestrator-worker system with planning, memory, subagents, citation passes, and iterative research loops.
📄 Paper Building Effective AI Agents: Architecture Patterns and Implementation Frameworks - PDF overview of agent architecture patterns, including generator-evaluator loops.
📝 Blog AI Agent Architectures - System-design overview of ReAct, reflection, planning, tool use, memory, and control strategies.
📝 Blog What Are Agentic Workflows? - Accessible taxonomy of planning, tool use, reflection, and memory patterns.
📝 Blog Agent Planning & Reflection Patterns - Visual explanation of plan-execute, observe, reflect, retry, and stop patterns.
📝 Blog Agentic Design Patterns - Practical overview of ReAct, reflection, tool use, planning, and how to combine them in real-world agents.
🔁 Pattern 12 Factor Agents - Operating principles for production agents, including explicit prompts, state ownership, and pause-resume behavior.
🔁 Pattern Durable Execution for Agentic Workflows - Explains checkpointing, event-sourced journals, replay, and recovery for long-running agent workflows.

Coding-Agent Loop Systems

🧰 Tool SWE-agent - Agent-computer interface and autonomous software engineering agent for repository tasks.
📄 Paper SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering - Paper behind SWE-agent and its interface design.
🧰 Tool mini-SWE-agent - Minimal coding agent that is useful for understanding the core loop without a large framework.
🧰 Tool OpenHands - Open platform for AI software developers as generalist agents.
📄 Paper OpenHands: An Open Platform for AI Software Developers as Generalist Agents - Paper describing OpenHands, CodeActAgent, benchmarks, and generalist agent evaluation.
🧰 Tool Agentless - Workflow-based approach for software issue resolution using localization, repair, and patch validation.
📄 Paper Agentless: Demystifying LLM-based Software Engineering Agents - Useful contrast case: strong results through structured workflow rather than a fully open-ended agent.
🧰 Tool AutoCodeRover - Autonomous program improvement system for issue localization, patch generation, and validation.
📄 Paper AutoCodeRover: Autonomous Program Improvement - Paper on autonomous code repair loops over real repositories.
🔁 Pattern Ralph - Geoffrey Huntley's original Ralph technique: run one agent in a bare loop with fresh context per iteration and the filesystem plus specs as memory.
🔁 Pattern everything is a ralph loop - Follow-up essay arguing the loop, not the agent, is the durable engineering unit: one task per iteration, deterministic context, and verification inside the loop.
🧰 Tool how-to-ralph-wiggum - Reference repository documenting the Ralph Wiggum technique end to end, from the bare loop script to guardrails and conventions.
📝 Blog A Brief History of Ralph - Traces how the bare-loop technique spread from a provocation to a production practice among early adopters.
🔁 Pattern Ralph Copilot - Language-agnostic Ralph loop implementation using fresh context, filesystem memory, PRD.md, and PROGRESS.md.
🔁 Pattern Compound Engineering - Every's named plan-work-review-compound loop, where each run feeds lessons back into AGENTS.md-style memory so the next loop is easier; the self-improving counterpart to Ralph.
🧰 Tool Gas Town - Steve Yegge's multi-agent orchestrator that runs 20-30 parallel coding agents with coordinator, worker, and merge-queue roles; the structured-orchestration end of the spectrum that Ralph anchors with bare iteration.
🧰 Tool Amp - Agentic coding tool built around threads, subagents, and an opinionated harness, with an owner's manual that documents loop-style operating practices.
🧰 Tool karl - Autonomous multi-agent development loop with planner, reviewer, architect, tester, developer, deployment, and retry phases.
🔁 Pattern joelclaw agent-loop skill - Durable Planner-Implementor-Reviewer-Judge coding loops via Inngest events and progress files.
🧭 List SWE-bench reading list - Maintained map of software engineering agent systems and related papers.
📄 Paper TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code - ICSE'26 observe-analyze-repair loop with instrumentation, analysis, and repair agents, a history-learning mechanism, and a rollback to the last good state; iteration alone drives most of the gain.

Verification And Feedback Gates

These resources include harness and observability mechanisms that loops compose into exit gates, receipts, and retry signals.

📝 Blog Why Agentic Systems Must Produce Deterministic Outputs to Scale - Argues for deterministic boundaries, contracts, and execution gates around probabilistic agent reasoning.
🔁 Pattern Stop Babysitting Your Coding Agent. Give It Backpressure. - Explains how to turn tests, linters, builds, traces, and other signals into feedback loops for coding agents.
🔁 Pattern How to Build a Self-Verification Loop in Claude Code - Uses hooks to enforce syntax, intent, and regression checks before an agent can finish.
📝 Blog How to build a better agent harness with traces and evals - Trace-evaluate-debug-refine loop for improving agent behavior from real runs.
📝 Blog Better Harness: A Recipe for Harness Hill-Climbing with Evals - LangChain's recipe for using evals as the learning signal for harness improvement.
📝 Blog Improving Deep Agents with harness engineering - Practical discussion of self-verification, traces, middleware, and loop detection for coding agents.
📚 Docs OpenAI agent evals - Evaluation guidance for moving from traces to repeatable grading of agent workflows.
🧰 Tool Promptfoo OpenAI Agents provider - Testing and assertions for multi-turn agent workflows, tools, state, handoffs, sandboxes, and traces.
🧰 Tool Inspect AI - UK AISI evaluation framework with solvers, scorers, sandboxing, tool use, MCP, and log viewing.
📚 Docs OpenTelemetry Semantic Conventions for Generative AI Systems - Portable tracing conventions for model calls, tool calls, and agent workflows.
🧰 Tool AgentOps - Monitoring, replay, cost tracking, benchmarking, and tracing for agent sessions.
🧰 Tool Langfuse - Open-source LLM engineering platform with tracing, evaluations, and metrics that loops can read back as feedback signals.
🧰 Tool LangSmith - Tracing, evaluation, and monitoring platform for inspecting and grading agent runs across iterations.
🧰 Tool Arize Phoenix - Open-source AI observability for tracing, evaluating, and debugging agent behavior from real runs.
🧰 Tool Braintrust - Evaluation and observability platform with experiments, datasets, and CI integration for gating agent changes.
🧰 Tool Weave - Weights & Biases toolkit for tracing, evaluating, and monitoring agent applications over time.
📄 Paper Agentic Verification of Software Systems - Pairs a coding agent with a theorem prover (AutoRocq) in a generate-and-validate loop, turning formal proof into the exit gate for trusted automatic programming.

Securing Unattended Loops

A loop that runs while nobody watches needs stronger boundaries than an interactive session. These resources cover the main risks: untrusted intake content, over-broad permissions, and unsandboxed execution.

⚠️ Critique The lethal trifecta for AI agents - Simon Willison's rule of thumb: private data, untrusted content, and an exfiltration channel must never meet inside one unattended agent.
⚠️ Critique Prompt injection series - Ongoing series on the core unsolved vulnerability for loops whose intake includes content written by strangers.
📚 Docs Agentic AI - Threats and Mitigations - OWASP threat model for agentic systems, useful when reviewing intake, memory, tool, and delegation boundaries.
🧰 Tool sandbox-runtime - Anthropic's OS-level filesystem and network sandboxing for arbitrary processes without requiring a container.
🧰 Tool E2B - Open-source isolated cloud sandboxes for running untrusted, AI-generated code inside agent loops.
📚 Docs Modal Sandboxes - Secure sandboxed execution for agent-driven code with resource limits and network controls.
🧰 Tool Daytona - Infrastructure for running AI-generated code in fast, isolated sandboxes.

State, Memory, And Context Persistence

This section focuses on durable loop state and cross-run context. For context-window design as its own lower layer, see the adjacent Context Engineering lists.

📚 Docs Effective Context Engineering for AI Agents - Anthropic guide to context as managed runtime state rather than a prompt dump.
📝 Blog Agent Harnesses: the Infrastructure Layer Your LLM Agent Actually Needs - Covers execution loops, state, checkpointing, observers, and replayability.
📝 Blog The Agent Loop Is the New OS - Frames the agent loop as an OS-like boundary with context as RAM and tools as I/O.
📝 Blog Harness engineering for coding agent users - Martin Fowler article on feedforward, feedback, and outer harnesses for coding agents.
📝 Blog Context Engineering - Simon Willison's framing of context engineering, useful for distinguishing context state from loop orchestration.
📝 Blog Agentic Coding in 2026 - Sourcegraph on supplying deterministic, large-codebase context and code intelligence so recurring agent runs reuse durable repository state instead of rediscovering it each time.
📝 Blog Agentic AI State Management with ScyllaDB and LangGraph - Durable agent state with checkpointers, write-ahead logs, and time-travel branching.
🧰 Tool Mem0 - Open-source memory layer for retaining user, session, and agent state across repeated agent sessions.
🧰 Tool Letta - Stateful agent framework from the MemGPT line with persistent, self-editing memory across runs.
🧰 Tool Zep - Temporal knowledge graph memory that tracks how facts about users and systems change across sessions.
🧰 Tool LangMem - SDK for extracting, consolidating, and retrieving long-term agent memory between loop runs.
🧰 Tool Beads - Git-plus-SQLite issue and memory store that agents read and write with a bd CLI, giving recurring loops durable task state and progress that survives context resets.
📄 Paper ARC: Active and Reflection-driven Context Management for Long-Horizon Agents - Treats context as a managed runtime artifact, reorganizing the working context when degradation or context rot is detected across a long run.

Orchestration And Multi-Agent Delegation

🧰 Tool AutoGen - Multi-agent programming framework for conversations, tool use, and orchestration; active development has moved to the Microsoft Agent Framework.
🧰 Tool Microsoft Agent Framework - Microsoft's successor to AutoGen and Semantic Kernel for building and orchestrating multi-agent workflows in Python and .NET.
🧰 Tool LangGraph - Graph-based framework for controllable agent workflows, persistence, and human-in-the-loop steps.
🧰 Tool CrewAI - Framework for multi-agent workflows organized around roles, tasks, and crews.
📚 Docs LlamaIndex Workflows - Event-driven workflow abstraction for agentic applications.
📚 Docs OpenAI Agents SDK handoffs - First-class delegation between specialized agents.
📚 Docs Agent Protocol - API protocol for agent interaction, useful for separating loop managers from agent runtimes.
🧰 Tool AgentKit - TypeScript toolkit for durable, event-driven agents on workflow infrastructure.
🧰 Tool deepagents - LangChain project for deeper, longer-running agents with middleware and harness patterns.
📚 Docs Temporal for AI - Durable execution for long-running agent workflows: crash-proof state, automatic retries, and human-in-the-loop signals.
🧰 Tool Restate - Durable execution runtime for building resilient, stateful agents and workflows that survive failures mid-loop.
🧰 Tool DBOS - Lightweight Postgres-backed durable execution library for crash-proof agent workflows, queues, and scheduled triggers.
🧰 Tool Composio Agent Orchestrator - Orchestrates parallel coding agents in isolated worktrees that plan tasks, fix CI failures, respond to reviews, and manage their own PR lifecycle.

Benchmarks And Evaluation

🧪 Benchmark SWE-bench - Benchmark for resolving real GitHub issues through code editing and tests.
📄 Paper SWE-bench: Can Language Models Resolve Real-World GitHub Issues? - Original SWE-bench paper.
📄 Paper SWE-bench Goes Live - Dynamic benchmark designed to reduce overfitting to static issue sets.
🧪 Benchmark Terminal-Bench - Benchmark for agents operating in terminal environments.
🧰 Tool Terminal-Bench repository - Open-source benchmark and harness for hard terminal tasks.
📄 Paper AgentBench - Multi-environment benchmark for evaluating LLMs as agents.
📄 Paper WebArena - Realistic web environment for autonomous agents.
📄 Paper OSWorld - Benchmark for multimodal agents operating full computer environments.
📄 Paper ToolBench - Tool-use benchmark and dataset for tool-augmented agents.
📄 Paper GAIA - Benchmark for general AI assistants requiring reasoning, tool use, and multi-step work.
📄 Paper Tau-bench - Benchmark for tool-agent-user interactions in realistic domains.
📄 Paper VisualWebArena - Visually grounded web-agent benchmark extending WebArena.
📄 Paper AppWorld - Benchmark of interactive app tasks with state-based and execution-based evaluation.
📄 Paper Vending-Bench - Benchmark for long-term coherence of autonomous agents; documents how small errors compound over very long loop horizons.
🧪 Benchmark Vending-Bench leaderboard - Live long-horizon coherence results from Andon Labs.
📄 Paper SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios - Release-note-derived evolution tasks where agents score far below isolated-issue benchmarks, quantifying the long-horizon gap loops must manage.
📄 Paper EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification - A skill generator and a co-evolving surrogate verifier improve multi-file skill packages over iterations, evaluated on the SkillsBench benchmark of structured skill bundles.
📄 Paper SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks - Quantifies structural erosion and verbosity creep across iteration checkpoints in native harnesses like Claude Code and Codex, evidence for why loops need verification and budgets.
📄 Paper LongCLI-Bench: A Preliminary Benchmark for Long-horizon Agentic Programming in Command-Line Interfaces - Long-horizon CLI tasks where most runs stall below 30% completion, mapping where unattended loops break down.

Operations Playbooks

📝 Blog Agentic Engineering: The Agent Loop - Minimal mental model for the loop underlying agent operation.
📝 Blog The agent loop: ReAct, plan-and-execute, reflection - Practical walkthrough of the base loop and common variants.
📝 Blog How to Build an Agent - Thorsten Ball's demystification of the inner agent loop: a model, a loop, and enough tokens.
📝 Blog Agentic Coding Recommendations - Armin Ronacher's field notes on which practices hold up when agents do most of the work.
📝 Blog Coding Agents 101: The Art of Actually Getting Things Done - Practical delegation guidance from the Devin team on scoping tasks agents can actually finish.
📝 Blog How Anthropic teams use Claude Code - Cross-team field report of real recurring agent workflows in engineering, security, and data science.
📝 Blog How Boris Uses Claude Code - Unofficial but concrete compilation of Boris Cherny's autonomous setups: parallel worktrees, auto mode, /loop, /schedule, dynamic workflows, and /goal completion conditions.
📝 Blog Agent of the Day: Copilot Agent PR Analysis - Official walkthrough of a daily scheduled agentic workflow that ingests PR data, analyzes it, and publishes findings to a Discussion, a concrete recurring loop with trigger, intake, analysis, and output.

Templates And Patterns

Reusable patterns that contributors can turn into future examples, templates, or playbooks.

🧾 Template Resource entry template - Format for adding a single resource with evidence quality and category fit.
🧾 Template Loop pattern template - Template for documenting an operational loop such as PR babysitting, CI repair, or feedback clustering.
🧾 Template Loop contract schema - Machine-readable schema for portable loop specs.
🧾 Template Loop contract preview script - Dependency-free demo that validates and renders a loop contract JSON file.
🧾 Template Translation guide - How to add or maintain a language translation without drifting from the canonical English list.
🧾 Template Pattern library index - Practical loop patterns with triggers, state, verification gates, budgets, and escalation paths.

Additional loop patterns worth documenting include PR babysitting, CI repair, feedback clustering, deploy verification, and docs drift collection.

Examples And Schema

Concrete examples make the loop contract easier to adapt to real repositories.

🔁 Pattern Example loop specs - Human-readable walkthroughs for PR babysitting, CI repair, and docs drift collection.
🧾 Template Loop contract library - Schema-validated loop contracts for every pattern-library loop, from PR babysitting to model routing.
🧾 Template Runnable test-repair loop - Dependency-light reference loop script with a verification gate, retry budget, durable progress log, repeat-failure detection, and escalation exit.
🧾 Template Runnable loop guide - Maps the script line by line to the Loop Contract and shows how to drive it with Claude Code, Codex CLI, or any agent CLI.

Preview an example locally:

python3 scripts/preview_loop_contract.py examples/pr-babysitter-loop.json

Community Gallery

The gallery is for real-world or realistic loop examples contributed by the community.

Running a real loop? Share it, real or anonymized, in the patterns discussion linked under Roadmap And Discussion below. Use the minimum useful case study and anonymization checklists so others can learn from it safely.

🧾 Template Loop gallery guide - Quality bar for contributed loop examples with receipts and lessons learned.
🧾 Template Loop gallery template - Markdown template for sharing a loop's trigger, intake, state, verification, escalation, and safety notes.
🔁 Pattern PR babysitter reference loop - Reference gallery entry for keeping a pull request moving.
🔁 Pattern CI repair reference loop - Reference gallery entry for turning failing CI into a verified patch or escalation.
🔁 Pattern Docs drift reference loop - Reference gallery entry for recurring docs/code consistency checks.

Discovery And Distribution

This repository includes a lightweight GitHub Pages landing page for search and social previews:

🧾 Template Landing page - SEO-friendly entry point for the repository.
🧭 List Hugging Face mirror - Synced copy of this repository on the Hugging Face Hub for discovery within the AI/ML community.
🧾 Template Landing page source - Source for the static landing page.
🧾 Template Sitemap - Crawl hints for the landing page and core repository pages.
🧾 Template Robots file - Allows indexing and points crawlers to the sitemap.

For launch copy and backlink strategy, use the distribution checklist.

Roadmap And Discussion

🧾 Template Roadmap - Near-term work, pattern priorities, gallery goals, and open questions.
🧾 Template Launch article - Shareable explanation of the concept and repository.
🧾 Template Discussion guide - Suggested discussion categories, starter prompts, and moderation standard.
🔁 Pattern Show your Loop Engineering patterns - Community discussion for real or anonymized loop examples.

Pattern Library

Practical loop patterns translate the abstract contract into runnable operating models. Each pattern documents the trigger, discover/intake step, agents, workspace, state, verification gates, retry budget, escalation path, and loop instruction.

🔁 Pattern PR babysitter - Repeatedly checks review comments, CI, merge conflicts, stale threads, and readiness to merge.
🔁 Pattern CI repair loop - Reproduces failing checks, patches narrowly, reruns evidence, and escalates when failures are outside scope.
🔁 Pattern Docs drift collector - Finds mismatches between docs and code, proposes small patches, and verifies examples.
🔁 Pattern Deploy verifier - Watches rollout signals, compares them with release expectations, and stops on anomalies.
🔁 Pattern Feedback clusterer - Periodically groups GitHub, Linear, Slack, support, or social feedback into actionable themes.
🔁 Pattern Dependency triage loop - Classifies dependency updates, applies safe groups, verifies them, and escalates risky upgrades.
🔁 Pattern Evaluation regression loop - Investigates degraded agent evals with baseline traces, targeted reruns, and repair proposals.
🔁 Pattern Security review loop - Reviews sensitive diffs with evidence-backed findings, safe permissions, and human approval boundaries.
🔁 Pattern Cost-control loop - Monitors agent workflow spend, identifies waste, proposes scoped savings, and preserves quality gates.
🔁 Pattern Bug hunting loop - Discovers, reproduces, minimizes, and reports bugs with concrete evidence.
🔁 Pattern Enterprise approval loop - Drives a permissioned change through required gates and approvers with a full audit trail.
🔁 Pattern Incident response loop - Triages an alert into an owned, evidence-backed incident with a postmortem seed.
🔁 Pattern Data-quality loop - Validates each dataset refresh against quality rules and quarantines bad versions.
🔁 Pattern Release-note loop - Drafts release notes from merged commits, issues, and PRs with linked evidence.
🔁 Pattern Model-routing loop - Routes tasks across models on measured quality, latency, privacy, and cost.

Critiques, Risks, And Limitations

⚠️ Critique Most Developers Do Not Need Agent Loops Yet - Useful caution against adopting loops before the task, signal, and economics justify them.
⚠️ Critique Engineering Agentic Systems for Reliability - Cautions that agentic systems fail at boundaries when permissions, verification, traceability, and escalation are weak.
⚠️ Critique Self-Correcting Agents: Reflexion, CRITIC, and ReAct Loops Compared - Compares self-correction patterns and their cost/failure tradeoffs.
⚠️ Critique How to Build an AI Agent Harness: A 2026 Complete Guide - Broad guide with useful warnings on data readiness, permissions, context management, and evaluation.
⚠️ Critique Harness Engineering vs Prompt Engineering vs Context Engineering Explained - Adjacent framing that helps avoid confusing loop engineering with the surrounding harness discipline.

Adjacent Awesome Lists

🧭 List Awesome Harness Engineering - Comprehensive list for the agent harness layer that Loop Engineering builds on.
🧭 List Awesome Harness Engineering - High-signal harness list with strong categories for context, guardrails, specs, evals, runtimes, and benchmarks.
🧭 List Awesome Agent Harness - Curated tools and resources for environments, constraints, and feedback around coding agents.
🧭 List Awesome Context Engineering - Survey-style list for context engineering across LLMs and agents.
🧭 List Awesome Prompt Engineering - Classic adjacent list for prompt techniques and prompting resources.
🧭 List Awesome LLM Agents - General list of LLM agent papers, frameworks, and applications.
🧭 List Awesome AI Agents - Broad AI agent ecosystem map.
🧭 List Awesome CLI Coding Agents - Directory of terminal-native coding agents, parallel runners, autonomous loops, and the harnesses that orchestrate them.
🧭 List Awesome Self-Evolving Agents - Survey-style list of agents that improve themselves over repeated runs, an adjacent angle on long-running loops with memory and verification.
🧭 List Awesome AI Agent Papers - Curated 2026 research collection across agent engineering, memory, evaluation, workflows, and autonomous systems, a paper-level feeder for loop-design foundations.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.

This repository uses a strict curation standard to keep the list focused, verifiable, and useful for builders. Maintainers can use the maintenance guide for link checks, identity checks, and periodic refreshes.

For community expectations and support channels, see CODE_OF_CONDUCT.md, SUPPORT.md, and SECURITY.md.

Fast path for adding a resource:

Check that it is about AI/coding-agent Loop Engineering or a direct foundation for it.
Search the README to avoid duplicates.
Pick the most specific category.
Add one entry using this format:

- 📄 **Paper** [Title](https://example.com) - One sentence explaining the resource's contribution to Loop Engineering.

Open a pull request and explain the category fit, source type, and why builders should care.

Fast path for contributing a loop pattern: start from the loop pattern template or loop contract schema, include trigger, discover/intake, delegation, workspace, context, verification, durable state, budget, escalation, and exit, then open a pattern suggestion issue if you want feedback before writing the full pattern.

Good submissions should answer three questions:

Is this about the new AI/coding-agent meaning of Loop Engineering or a direct foundation for it?
Does it help someone design, run, verify, evaluate, or critique recurring agent systems that coordinate prompting, context, harnesses, verification, and state?
Is the source stable, public, and specific enough to be useful?

Citation

If this repository is useful in your work, please cite it with:

@misc{chaoyue2026awesome_loop_engineering,
  author       = {He, Chaoyue},
  title        = {Awesome Loop Engineering},
  year         = {2026},
  howpublished = {\url{https://github.com/ChaoYue0307/awesome-loop-engineering}},
  note         = {Curated resources for Loop Engineering}
}

Reusable blurb (for blog posts, talks, internal docs, or community posts):

Loop Engineering is the practice of designing recurring AI-agent and coding-agent systems that discover work, delegate to agents, verify results, persist state, and retry or escalate on a cadence or until a goal is reached. Awesome Loop Engineering is a curated, implementation-focused resource collection for this practice: github.com/ChaoYue0307/awesome-loop-engineering

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for cy0307/awesome-loop-engineering

Agentic Verification of Software Systems

Paper • 2511.17330 • Published Apr 11

EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

Paper • 2604.01687 • Published Apr 2 • 8

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

Paper • 2603.24755 • Published Mar 25 • 30

When the Specification Emerges: Benchmarking Faithfulness Loss in Long-Horizon Coding Agents

Paper • 2603.17104 • Published Mar 17

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Paper • 2602.14337 • Published Feb 15 • 15