|
|
<!DOCTYPE html> |
|
|
<html lang="en"> |
|
|
|
|
|
<head> |
|
|
<meta charset="UTF-8"> |
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"> |
|
|
<title>DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL</title> |
|
|
<link rel="stylesheet" href="style.css"> |
|
|
<link rel="preconnect" href="https://fonts.googleapis.com"> |
|
|
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> |
|
|
<link |
|
|
href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:ital,wght@0,100..700;1,100..700&family=Inter:ital,opsz,wght@0,14..32,100..900;1,14..32,100..900&display=swap" |
|
|
rel="stylesheet"> |
|
|
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script> |
|
|
</head> |
|
|
|
|
|
<body> |
|
|
|
|
|
<header> |
|
|
<div class="content"> |
|
|
<h1>DeepLTL</h1> |
|
|
<h2>Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL</h2> |
|
|
<p class="authors">Mathias Jackermeier,<br class="wide-hidden" /> Alessandro Abate <br /> |
|
|
University of Oxford</p> |
|
|
<p>ICLR 2025 (Oral)</p> |
|
|
<div class="links"> |
|
|
<a href="https://openreview.net/pdf?id=9pW2J49flQ" target="_blank">Paper</a> <span class="separator">/</span> |
|
|
<a href="https://deep-ltl.github.io/slides.pdf" target="_blank">Slides</a> <span class="separator">/</span> |
|
|
<a href="https://github.com/mathiasj33/deep-ltl" target="_blank">GitHub</a> |
|
|
</div> |
|
|
</div> |
|
|
<div class="tldr"> |
|
|
<p>💡<i>TL;DR: We develop a novel approach to train RL agents to zero-shot follow arbitrary instructions specified in a formal language.</i></p> |
|
|
</div> |
|
|
</header> |
|
|
|
|
|
<main> |
|
|
<section id="abstract"> |
|
|
<div class="content"> |
|
|
<h2>Abstract</h2> |
|
|
<p> |
|
|
Linear temporal logic (LTL) has recently been adopted as a powerful formalism |
|
|
for specifying complex, temporally extended tasks in multi-task reinforcement |
|
|
learning (RL). However, learning policies that efficiently satisfy arbitrary specifications not |
|
|
observed |
|
|
during training remains a challenging problem. Existing approaches suffer from several shortcomings: |
|
|
they are often only applicable to |
|
|
finite-horizon fragments of LTL, are restricted to suboptimal solutions, and do not |
|
|
adequately handle safety constraints. In this work, we propose a novel learning |
|
|
approach to address these concerns. Our method leverages the structure of Büchi |
|
|
automata, which explicitly represent the semantics of LTL specifications, to learn |
|
|
policies conditioned on sequences of truth assignments that lead to satisfying the |
|
|
desired formulae. Experiments in a variety of discrete and continuous domains |
|
|
demonstrate that our approach is able to zero-shot satisfy a wide range of finite- and |
|
|
infinite-horizon specifications, and outperforms existing methods in terms of both |
|
|
satisfaction probability and efficiency. |
|
|
</p> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
<section> |
|
|
<div class="content"> |
|
|
<h2>DeepLTL</h2> |
|
|
<div class="image overview"> |
|
|
<img src="images/overview.png" alt="Schematic overview of DeepLTL"> |
|
|
</div> |
|
|
<p> |
|
|
We develop a novel approach, called <i>DeepLTL</i>, to learn policies that efficiently zero-shot |
|
|
satisfy complex LTL specifications. |
|
|
We first employ goal-conditioned RL to learn a general policy that can execute reach-avoid |
|
|
sequences |
|
|
of high-level propositions, such as "reach goal A while avoiding region B, and subsequently stay |
|
|
in |
|
|
region C". At test time, we first translate a given LTL specification into a limit-deterministic |
|
|
Büchi automaton (LDBA), from which we extract possible reach-avoid sequences that satisfy the |
|
|
formula. We select the optimal sequence according to the learned value function, and execute it |
|
|
with |
|
|
the trained policy. |
|
|
</p> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
<section> |
|
|
<div class="content"> |
|
|
<h2>Quantitative results</h2> |
|
|
<div class="image"> |
|
|
<img src="images/curves.png" alt="Schematic overview of DeepLTL"> |
|
|
</div> |
|
|
<p> |
|
|
We find that DeepLTL is able to efficiently zero-shot satisfy a wide range of finite- and |
|
|
infinite-horizon LTL specifications in a variety of discrete and continuous domains. Our |
|
|
approach |
|
|
outperforms existing methods in terms of both satisfaction probability and efficiency. |
|
|
</p> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
<section> |
|
|
<div class="content"> |
|
|
<h2>Qualitative results</h2> |
|
|
<div class="videos"> |
|
|
<div class="video-with-caption"> |
|
|
<div class="video-container"> |
|
|
<video muted loop disablePictureInPicture preload="metadata"> |
|
|
<source src="images/output.mp4#t=0.1" type="video/mp4"> |
|
|
</video> |
|
|
<button class="play-button"> |
|
|
<svg class="play-icon" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"> |
|
|
<path d="M8 5v14l11-7z" /> |
|
|
</svg> |
|
|
<svg class="pause-icon" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"> |
|
|
<path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z" /> |
|
|
</svg> |
|
|
</button> |
|
|
</div> |
|
|
<p> |
|
|
\((\neg(\mathsf{yellow}\lor \mathsf{green})\;\mathsf{U}\; \mathsf{magenta}) \land |
|
|
\mathsf{F}\,\mathsf{blue}\) |
|
|
</p> |
|
|
</div> |
|
|
<div class="video-with-caption"> |
|
|
<div class="video-container"> |
|
|
<video muted loop disablePictureInPicture preload="metadata"> |
|
|
<source src="images/infinite.mp4#t=0.1" type="video/mp4"> |
|
|
</video> |
|
|
<button class="play-button"> |
|
|
<svg class="play-icon" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"> |
|
|
<path d="M8 5v14l11-7z" /> |
|
|
</svg> |
|
|
<svg class="pause-icon" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"> |
|
|
<path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z" /> |
|
|
</svg> |
|
|
</button> |
|
|
</div> |
|
|
<p> |
|
|
\(\mathsf{G}\mathsf{F}\,\mathsf{green} \land \mathsf{G}\mathsf{F}\,\mathsf{yellow} \land |
|
|
\mathsf{G}\,\neg\mathsf{blue}\) |
|
|
</p> |
|
|
</div> |
|
|
</div> |
|
|
<p> |
|
|
We demonstrate the behaviour of the task-conditioned policy learned by DeepLTL in the zone |
|
|
environment. DeepLTL consistently achieves the desired behaviour, even with complex, |
|
|
infinite-horizon specifications. |
|
|
</p> |
|
|
</div> |
|
|
</section> |
|
|
|
|
|
<section> |
|
|
<div class="content"> |
|
|
<h2>Citation</h2> |
|
|
<div class="citation"> |
|
|
<pre> |
|
|
@inproceedings{deepltl, |
|
|
title = {Deep{LTL}: Learning to Efficiently Satisfy Complex {LTL} Specifications for Multi-Task {RL}}, |
|
|
author = {Mathias Jackermeier and Alessandro Abate}, |
|
|
booktitle = {International Conference on Learning Representations ({ICLR})}, |
|
|
year = {2025} |
|
|
}</pre> |
|
|
<div class="copy"> |
|
|
<span class="copy-text">Copied!</span> |
|
|
<div class="copy-icon" onclick="copyCitation()"> |
|
|
<svg xmlns="http://www.w3.org/2000/svg" class="copy-click" |
|
|
viewBox="0 0 448 512"> |
|
|
<path fill="currentColor" |
|
|
d="M384 336l-192 0c-8.8 0-16-7.2-16-16l0-256c0-8.8 7.2-16 16-16l140.1 0L400 115.9 400 320c0 8.8-7.2 16-16 16zM192 384l192 0c35.3 0 64-28.7 64-64l0-204.1c0-12.7-5.1-24.9-14.1-33.9L366.1 14.1c-9-9-21.2-14.1-33.9-14.1L192 0c-35.3 0-64 28.7-64 64l0 256c0 35.3 28.7 64 64 64zM64 128c-35.3 0-64 28.7-64 64L0 448c0 35.3 28.7 64 64 64l192 0c35.3 0 64-28.7 64-64l0-32-48 0 0 32c0 8.8-7.2 16-16 16L64 464c-8.8 0-16-7.2-16-16l0-256c0-8.8 7.2-16 16-16l32 0 0-48-32 0z" /> |
|
|
</svg> |
|
|
<svg xmlns="http://www.w3.org/2000/svg" class="copy-confirm" |
|
|
viewBox="0 0 448 512"> |
|
|
<path fill="currentColor" |
|
|
d="M438.6 105.4c12.5 12.5 12.5 32.8 0 45.3l-256 256c-12.5 12.5-32.8 12.5-45.3 0l-128-128c-12.5-12.5-12.5-32.8 0-45.3s32.8-12.5 45.3 0L160 338.7 393.4 105.4c12.5-12.5 32.8-12.5 45.3 0z" /> |
|
|
</svg> |
|
|
</div> |
|
|
</div> |
|
|
</div> |
|
|
</div> |
|
|
</section> |
|
|
</main> |
|
|
|
|
|
<footer> |
|
|
<div class="content"> |
|
|
Header background by <a href="http://www.freepik.com">Harryarts / Freepik</a> |
|
|
</div> |
|
|
</footer> |
|
|
|
|
|
<script> |
|
|
document.addEventListener('DOMContentLoaded', () => { |
|
|
document.querySelectorAll('.video-container').forEach(container => { |
|
|
const video = container.querySelector('video'); |
|
|
const button = container.querySelector('.play-button'); |
|
|
const playIcon = button.querySelector('.play-icon'); |
|
|
const pauseIcon = button.querySelector('.pause-icon'); |
|
|
|
|
|
container.addEventListener('click', () => { |
|
|
if (video.paused) { |
|
|
video.play(); |
|
|
playIcon.style.opacity = 0; |
|
|
pauseIcon.style.opacity = 1; |
|
|
|
|
|
setTimeout(() => { |
|
|
button.style.opacity = 0; |
|
|
button.style.cursor = 'auto'; |
|
|
}, 50); |
|
|
} else { |
|
|
video.pause(); |
|
|
button.style.opacity = 1; |
|
|
playIcon.style.opacity = 1; |
|
|
pauseIcon.style.opacity = 0; |
|
|
button.style.cursor = 'pointer'; |
|
|
} |
|
|
}); |
|
|
}); |
|
|
}); |
|
|
|
|
|
const copyCitation = () => { |
|
|
const citationText = document.querySelector('.citation pre').textContent; |
|
|
navigator.clipboard.writeText(citationText).then(showCopyConfirm, (err) => console.error(err)); |
|
|
} |
|
|
|
|
|
let copyEnabled = true; |
|
|
|
|
|
const showCopyConfirm = () => { |
|
|
if (!copyEnabled) return; |
|
|
copyEnabled = false; |
|
|
const copyIcon = document.querySelector('.copy-click'); |
|
|
const copyConfirm = document.querySelector('.copy-confirm'); |
|
|
const copyText = document.querySelector('.copy-text'); |
|
|
copyIcon.style.opacity = 0; |
|
|
copyConfirm.style.opacity = 1; |
|
|
copyText.classList.add('fade-in'); |
|
|
setTimeout(() => { |
|
|
copyIcon.style.opacity = 1; |
|
|
copyConfirm.style.opacity = 0; |
|
|
copyText.classList.remove('fade-in'); |
|
|
copyText.classList.add('fade-out'); |
|
|
setTimeout(() => { |
|
|
copyText.classList.remove('fade-out'); |
|
|
copyEnabled = true; |
|
|
}, 300); |
|
|
}, 2000); |
|
|
} |
|
|
</script> |
|
|
</body> |
|
|
|
|
|
</html> |