diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000000000000000000000000000000000000..5837b2b57b8d319f7a12c1b0ff413044b7792f33 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,118 @@ +# Changelog + +All notable changes to the Research Article Template will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [Unreleased] + +### Added +- Initial open source release +- Comprehensive documentation +- Contributing guidelines +- License file + +## [1.0.0] - 2024-12-19 + +### Added +- **Core Features**: + - Markdown/MDX-based writing system + - KaTeX mathematical notation support + - Syntax highlighting for code blocks + - Academic citations with BibTeX integration + - Footnotes and sidenotes system + - Auto-generated table of contents + - Interactive Mermaid diagrams + - Plotly.js and D3.js integration + - HTML embed support + - Gradio app embedding + - Dataviz color palettes + - Image optimization + - SEO-friendly structure + - Automatic PDF export + - Dark/light theme toggle + - Mobile-responsive design + - LaTeX import functionality + - Template synchronization system + +- **Components**: + - Figure component with captions + - MultiFigure for image galleries + - Note component with variants + - Quote component + - Accordion for collapsible content + - Sidenote component + - Table of Contents + - Theme Toggle + - HTML Embed + - Raw HTML support + - SEO component + - Hero section + - Footer + - Full-width and wide layouts + +- **Build System**: + - Astro 4.10.0 integration + - PostCSS with custom media queries + - Automatic compression + - Docker support + - Nginx configuration + - Git LFS support + +- **Scripts**: + - PDF export functionality + - LaTeX to MDX conversion + - Template synchronization + - Font SVG generation + - TrackIO data generation + +- **Documentation**: + - Getting started guide + - Writing best practices + - Component reference + - LaTeX conversion guide + - Interactive examples + +### Technical Details +- **Framework**: Astro 4.10.0 +- **Styling**: PostCSS with custom properties +- **Math**: KaTeX 0.16.22 +- **Charts**: Plotly.js 3.1.0, D3.js 7.9.0 +- **Diagrams**: Mermaid 11.10.1 +- **Node.js**: >=20.0.0 +- **License**: CC-BY-4.0 + +### Browser Support +- Chrome (latest) +- Firefox (latest) +- Safari (latest) +- Edge (latest) + +--- + +## Version History + +- **1.0.0**: Initial stable release with full feature set +- **0.0.1**: Development version (pre-release) + +## Migration Guide + +### From 0.0.1 to 1.0.0 + +This is the first stable release. No breaking changes from the development version. + +### Updating Your Project + +Use the template synchronization system to update: + +```bash +npm run sync:template -- --dry-run # Preview changes +npm run sync:template # Apply updates +``` + +## Support + +- **Documentation**: [Hugging Face Space](https://huggingface.co/spaces/tfrere/research-article-template) +- **Issues**: [Community Discussions](https://huggingface.co/spaces/tfrere/research-article-template/discussions) +- **Contact**: [@tfrere](https://huggingface.co/tfrere) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000000000000000000000000000000000000..a4573b5d9abcd9e9ba35095677d0443b157298ec --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,196 @@ +# Contributing to Research Article Template + +Thank you for your interest in contributing to the Research Article Template! This document provides guidelines and information for contributors. + +## ๐Ÿค How to Contribute + +### Reporting Issues + +Before creating an issue, please: +1. **Search existing issues** to avoid duplicates +2. **Use the issue template** when available +3. **Provide detailed information**: + - Clear description of the problem + - Steps to reproduce + - Expected vs actual behavior + - Environment details (OS, Node.js version, browser) + - Screenshots if applicable + +### Suggesting Features + +We welcome feature suggestions! Please: +1. **Check existing discussions** first +2. **Describe the use case** clearly +3. **Explain the benefits** for the community +4. **Consider implementation complexity** + +### Code Contributions + +#### Getting Started + +1. **Fork the repository** on Hugging Face +2. **Clone your fork**: + ```bash + git clone git@hf.co:spaces//research-article-template + cd research-article-template + ``` +3. **Install dependencies**: + ```bash + cd app + npm install + ``` +4. **Create a feature branch**: + ```bash + git checkout -b feature/your-feature-name + ``` + +#### Development Workflow + +1. **Make your changes** following our coding standards +2. **Test thoroughly**: + ```bash + npm run dev # Test locally + npm run build # Ensure build works + ``` +3. **Update documentation** if needed +4. **Commit with clear messages**: + ```bash + git commit -m "feat: add new component for interactive charts" + ``` + +#### Pull Request Process + +1. **Push your branch**: + ```bash + git push origin feature/your-feature-name + ``` +2. **Create a Pull Request** with: + - Clear title and description + - Reference related issues + - Screenshots for UI changes + - Testing instructions + +## ๐Ÿ“‹ Coding Standards + +### Code Style + +- **Use Prettier** for consistent formatting +- **Follow existing patterns** in the codebase +- **Write clear, self-documenting code** +- **Add comments** for complex logic +- **Use meaningful variable names** + +### File Organization + +- **Components**: Place in `src/components/` +- **Styles**: Use CSS modules or component-scoped styles +- **Assets**: Organize in `src/content/assets/` +- **Documentation**: Update relevant `.mdx` files + +### Commit Message Format + +We follow [Conventional Commits](https://www.conventionalcommits.org/): + +``` +type(scope): description + +feat: add new interactive chart component +fix: resolve mobile layout issues +docs: update installation instructions +style: improve button hover states +refactor: simplify component structure +test: add unit tests for utility functions +``` + +**Types**: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore` + +## ๐Ÿงช Testing + +### Manual Testing + +Before submitting: +- [ ] Test on different screen sizes +- [ ] Verify dark/light theme compatibility +- [ ] Check browser compatibility (Chrome, Firefox, Safari) +- [ ] Test with different content types +- [ ] Ensure accessibility standards + +### Automated Testing + +```bash +# Run build to catch errors +npm run build + +# Test PDF export +npm run export:pdf + +# Test LaTeX conversion +npm run latex:convert +``` + +## ๐Ÿ“š Documentation + +### Writing Guidelines + +- **Use clear, concise language** +- **Provide examples** for complex features +- **Include screenshots** for UI changes +- **Update both English content and code comments** + +### Documentation Structure + +- **README.md**: Project overview and quick start +- **CONTRIBUTING.md**: This file +- **Content files**: In `src/content/chapters/demo/` +- **Component docs**: Inline comments and examples + +## ๐ŸŽฏ Areas for Contribution + +### High Priority + +- **Bug fixes** and stability improvements +- **Accessibility enhancements** +- **Mobile responsiveness** +- **Performance optimizations** +- **Documentation improvements** + +### Feature Ideas + +- **New interactive components** +- **Additional export formats** +- **Enhanced LaTeX import** +- **Theme customization** +- **Plugin system** + +### Community + +- **Answer questions** in discussions +- **Share examples** of your work +- **Write tutorials** and guides +- **Help with translations** + +## ๐Ÿšซ What Not to Contribute + +- **Breaking changes** without discussion +- **Major architectural changes** without approval +- **Dependencies** that significantly increase bundle size +- **Features** that don't align with the project's goals + +## ๐Ÿ“ž Getting Help + +- **Discussions**: [Community tab](https://huggingface.co/spaces/tfrere/research-article-template/discussions) +- **Issues**: [Report bugs](https://huggingface.co/spaces/tfrere/research-article-template/discussions?status=open&type=issue) +- **Contact**: [@tfrere](https://huggingface.co/tfrere) on Hugging Face + +## ๐Ÿ“„ License + +By contributing, you agree that your contributions will be licensed under the same [CC-BY-4.0 license](LICENSE) that covers the project. + +## ๐Ÿ™ Recognition + +Contributors will be: +- **Listed in acknowledgments** (if desired) +- **Mentioned in release notes** for significant contributions +- **Credited** in relevant documentation + +Thank you for helping make scientific writing more accessible and interactive! ๐ŸŽ‰ diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..b267a53137822114e4c0bcef2e6383aaf52a70f1 --- /dev/null +++ b/LICENSE @@ -0,0 +1,33 @@ +Creative Commons Attribution 4.0 International License + +Copyright (c) 2024 Thibaud Frere + +This work is licensed under the Creative Commons Attribution 4.0 International License. +To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ +or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. + +You are free to: + + Share โ€” copy and redistribute the material in any medium or format + Adapt โ€” remix, transform, and build upon the material for any purpose, even commercially. + +The licensor cannot revoke these freedoms as long as you follow the license terms. + +Under the following terms: + + Attribution โ€” You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. + + No additional restrictions โ€” You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. + +Notices: + + You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation. + + No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material. + +--- + +For the source code and technical implementation: +- The source code is available at: https://huggingface.co/spaces/tfrere/research-article-template +- Third-party figures and assets are excluded from this license and marked in their captions +- Dependencies and third-party libraries maintain their respective licenses diff --git a/README.md b/README.md index 3301c23cf8488bf55409e058c7d7e9de797cedab..114b903c9da3bc87749b1260eaa2eb272914fe92 100644 --- a/README.md +++ b/README.md @@ -8,4 +8,132 @@ pinned: false header: mini app_port: 8080 thumbnail: https://huggingface.co/spaces/tfrere/research-paper-template/thumb.jpg ---- \ No newline at end of file +--- + +# ๐Ÿ“ Research Article Template + +[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) +[![Node.js Version](https://img.shields.io/badge/node-%3E%3D20.0.0-brightgreen.svg)](https://nodejs.org/) +[![Astro](https://img.shields.io/badge/Astro-4.10.0-orange.svg)](https://astro.build/) +[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/tfrere/research-article-template) + +> **A modern, interactive template for scientific writing** that brings papers to life with web-native features, minimal setup, and maximum impact. + +## โœจ Features + +- ๐ŸŽฏ **Markdown-based** - Write in familiar Markdown/MDX +- ๐Ÿงฎ **KaTeX math** - Beautiful mathematical notation +- ๐ŸŽจ **Syntax highlighting** - Code blocks with proper highlighting +- ๐Ÿ“š **Academic citations** - BibTeX integration +- ๐Ÿ“ **Footnotes & sidenotes** - Rich annotation system +- ๐Ÿ“‹ **Table of contents** - Auto-generated navigation +- ๐Ÿ“Š **Interactive diagrams** - Mermaid, Plotly, D3.js ready +- ๐ŸŽญ **HTML embeds** - Include any web content +- ๐Ÿค– **Gradio app embeds** - Interactive ML demos +- ๐ŸŽจ **Dataviz color palettes** - Consistent visual design +- ๐Ÿ–ผ๏ธ **Optimized images** - Automatic optimization +- โšก **Lightweight bundle** - Fast loading +- ๐Ÿ” **SEO friendly** - Search engine optimized +- ๐Ÿ—๏ธ **Automatic build** - CI/CD ready +- ๐Ÿ“„ **PDF export** - Generate publication-ready PDFs +- ๐ŸŒ™ **Dark theme** - Modern UI with theme toggle +- ๐Ÿ“ฑ **Mobile friendly** - Responsive design +- ๐Ÿ“ฅ **LaTeX import** - Convert existing papers +- ๐Ÿ”„ **Template updates** - Stay current with improvements + +## ๐Ÿš€ Quick Start + +### Option 1: Duplicate on Hugging Face (Recommended) + +1. Visit **[๐Ÿค— Research Article Template](https://huggingface.co/spaces/tfrere/research-article-template)** +2. Click **"Duplicate this Space"** +3. Clone your new repository: + ```bash + git clone git@hf.co:spaces// + cd + ``` + +### Option 2: Clone Directly + +```bash +git clone https://github.com/tfrere/research-article-template.git +cd research-article-template +``` + +### Installation + +```bash +# Install Node.js 20+ (use nvm for version management) +nvm install 20 +nvm use 20 + +# Install Git LFS and pull assets +git lfs install +git lfs pull + +# Install dependencies +cd app +npm install + +# Start development server +npm run dev +``` + +Visit `http://localhost:4321` to see your site! + +## ๐Ÿ“– Documentation + +- **[Getting Started Guide](https://huggingface.co/spaces/tfrere/research-article-template)** - Complete setup instructions +- **[Writing Best Practices](https://huggingface.co/spaces/tfrere/research-article-template)** - Tips for effective scientific writing +- **[Component Reference](https://huggingface.co/spaces/tfrere/research-article-template)** - Available blocks and features +- **[LaTeX Conversion](https://huggingface.co/spaces/tfrere/research-article-template)** - Import existing papers + +## ๐ŸŽฏ Who This Is For + +- **Scientists** writing modern, web-native research papers +- **Educators** creating interactive, explorable lessons +- **Researchers** who want to focus on ideas, not infrastructure +- **Anyone** who values clear, engaging technical communication + +## ๐ŸŒŸ Inspired by Distill + +This template carries forward the spirit of [Distill](https://distill.pub/) (2016โ€“2021), pushing interactive scientific writing even further with: +- Accessible, high-quality explanations +- Reproducible, production-ready demos +- Modern web technologies and best practices + +## ๐Ÿค Contributing + +We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details. + +### Ways to Contribute + +- ๐Ÿ› **Report bugs** - Open an issue with detailed information +- ๐Ÿ’ก **Suggest features** - Share ideas for improvements +- ๐Ÿ“ **Improve documentation** - Help others get started +- ๐Ÿ”ง **Submit code** - Fix bugs or add features +- ๐Ÿ’ฌ **Join discussions** - Share feedback and ideas + +## ๐Ÿ“„ License + +This project is licensed under the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/). + +- **Diagrams and text**: CC-BY 4.0 +- **Source code**: Available on [Hugging Face](https://huggingface.co/spaces/tfrere/research-article-template) +- **Third-party figures**: Excluded and marked in captions + +## ๐Ÿ™ Acknowledgments + +- Inspired by [Distill](https://distill.pub/) and the interactive scientific writing movement +- Built with [Astro](https://astro.build/), [MDX](https://mdxjs.com/), and modern web technologies +- Community feedback and contributions from researchers worldwide + +## ๐Ÿ“ž Support + +- ๐Ÿ’ฌ **[Community Discussions](https://huggingface.co/spaces/tfrere/research-article-template/discussions)** - Ask questions and share ideas +- ๐Ÿ› **[Report Issues](https://huggingface.co/spaces/tfrere/research-article-template/discussions?status=open&type=issue)** - Bug reports and feature requests +- ๐Ÿ“ง **Contact**: [@tfrere](https://huggingface.co/tfrere) on Hugging Face + +--- + +**Made with โค๏ธ for the scientific community** \ No newline at end of file diff --git a/app/.astro/astro/content.d.ts b/app/.astro/astro/content.d.ts index eb236b062e47ff762326764dbd53546131697d54..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 100644 --- a/app/.astro/astro/content.d.ts +++ b/app/.astro/astro/content.d.ts @@ -1,284 +0,0 @@ -declare module 'astro:content' { - interface Render { - '.mdx': Promise<{ - Content: import('astro').MarkdownInstance<{}>['Content']; - headings: import('astro').MarkdownHeading[]; - remarkPluginFrontmatter: Record; - components: import('astro').MDXInstance<{}>['components']; - }>; - } -} - -declare module 'astro:content' { - interface RenderResult { - Content: import('astro/runtime/server/index.js').AstroComponentFactory; - headings: import('astro').MarkdownHeading[]; - remarkPluginFrontmatter: Record; - } - interface Render { - '.md': Promise; - } - - export interface RenderedContent { - html: string; - metadata?: { - imagePaths: Array; - [key: string]: unknown; - }; - } -} - -declare module 'astro:content' { - type Flatten = T extends { [K: string]: infer U } ? U : never; - - export type CollectionKey = keyof AnyEntryMap; - export type CollectionEntry = Flatten; - - export type ContentCollectionKey = keyof ContentEntryMap; - export type DataCollectionKey = keyof DataEntryMap; - - type AllValuesOf = T extends any ? T[keyof T] : never; - type ValidContentEntrySlug = AllValuesOf< - ContentEntryMap[C] - >['slug']; - - /** @deprecated Use `getEntry` instead. */ - export function getEntryBySlug< - C extends keyof ContentEntryMap, - E extends ValidContentEntrySlug | (string & {}), - >( - collection: C, - // Note that this has to accept a regular string too, for SSR - entrySlug: E, - ): E extends ValidContentEntrySlug - ? Promise> - : Promise | undefined>; - - /** @deprecated Use `getEntry` instead. */ - export function getDataEntryById( - collection: C, - entryId: E, - ): Promise>; - - export function getCollection>( - collection: C, - filter?: (entry: CollectionEntry) => entry is E, - ): Promise; - export function getCollection( - collection: C, - filter?: (entry: CollectionEntry) => unknown, - ): Promise[]>; - - export function getEntry< - C extends keyof ContentEntryMap, - E extends ValidContentEntrySlug | (string & {}), - >(entry: { - collection: C; - slug: E; - }): E extends ValidContentEntrySlug - ? Promise> - : Promise | undefined>; - export function getEntry< - C extends keyof DataEntryMap, - E extends keyof DataEntryMap[C] | (string & {}), - >(entry: { - collection: C; - id: E; - }): E extends keyof DataEntryMap[C] - ? Promise - : Promise | undefined>; - export function getEntry< - C extends keyof ContentEntryMap, - E extends ValidContentEntrySlug | (string & {}), - >( - collection: C, - slug: E, - ): E extends ValidContentEntrySlug - ? Promise> - : Promise | undefined>; - export function getEntry< - C extends keyof DataEntryMap, - E extends keyof DataEntryMap[C] | (string & {}), - >( - collection: C, - id: E, - ): E extends keyof DataEntryMap[C] - ? Promise - : Promise | undefined>; - - /** Resolve an array of entry references from the same collection */ - export function getEntries( - entries: { - collection: C; - slug: ValidContentEntrySlug; - }[], - ): Promise[]>; - export function getEntries( - entries: { - collection: C; - id: keyof DataEntryMap[C]; - }[], - ): Promise[]>; - - export function render( - entry: AnyEntryMap[C][string], - ): Promise; - - export function reference( - collection: C, - ): import('astro/zod').ZodEffects< - import('astro/zod').ZodString, - C extends keyof ContentEntryMap - ? { - collection: C; - slug: ValidContentEntrySlug; - } - : { - collection: C; - id: keyof DataEntryMap[C]; - } - >; - // Allow generic `string` to avoid excessive type errors in the config - // if `dev` is not running to update as you edit. - // Invalid collection names will be caught at build time. - export function reference( - collection: C, - ): import('astro/zod').ZodEffects; - - type ReturnTypeOrOriginal = T extends (...args: any[]) => infer R ? R : T; - type InferEntrySchema = import('astro/zod').infer< - ReturnTypeOrOriginal['schema']> - >; - - type ContentEntryMap = { - "chapters": { -"demo/best-pratices.mdx": { - id: "demo/best-pratices.mdx"; - slug: "demo/best-pratices"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/components.mdx": { - id: "demo/components.mdx"; - slug: "demo/components"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/debug-components.mdx": { - id: "demo/debug-components.mdx"; - slug: "demo/debug-components"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/getting-started.mdx": { - id: "demo/getting-started.mdx"; - slug: "demo/getting-started"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/greetings.mdx": { - id: "demo/greetings.mdx"; - slug: "demo/greetings"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/introduction.mdx": { - id: "demo/introduction.mdx"; - slug: "demo/introduction"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/latex-convertion.mdx": { - id: "demo/latex-convertion.mdx"; - slug: "demo/latex-convertion"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/markdown.mdx": { - id: "demo/markdown.mdx"; - slug: "demo/markdown"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/vibe-coding-charts.mdx": { - id: "demo/vibe-coding-charts.mdx"; - slug: "demo/vibe-coding-charts"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"demo/writing-your-content.mdx": { - id: "demo/writing-your-content.mdx"; - slug: "demo/writing-your-content"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -"your-first-chapter.mdx": { - id: "your-first-chapter.mdx"; - slug: "your-first-chapter"; - body: string; - collection: "chapters"; - data: any -} & { render(): Render[".mdx"] }; -}; -"embeds": { -"vibe-code-d3-embeds-directives.md": { - id: "vibe-code-d3-embeds-directives.md"; - slug: "vibe-code-d3-embeds-directives"; - body: string; - collection: "embeds"; - data: any -} & { render(): Render[".md"] }; -}; - - }; - - type DataEntryMap = { - "assets": { -"data/data": { - id: "data/data"; - collection: "assets"; - data: any -}; -"data/font-sprite-mapping": { - id: "data/font-sprite-mapping"; - collection: "assets"; - data: any -}; -"data/font_manifest": { - id: "data/font_manifest"; - collection: "assets"; - data: any -}; -"data/llm_benchmarks": { - id: "data/llm_benchmarks"; - collection: "assets"; - data: any -}; -"data/mnist-variant-model": { - id: "data/mnist-variant-model"; - collection: "assets"; - data: any -}; -"data/typography_data": { - id: "data/typography_data"; - collection: "assets"; - data: any -}; -}; - - }; - - type AnyEntryMap = ContentEntryMap & DataEntryMap; - - export type ContentConfig = never; -} diff --git a/app/package.json b/app/package.json index 660e1a654be5ca1a45138240dd8f8851f726986b..df93f49e7bdf92671c00235239e31e14c6b9fb70 100644 Binary files a/app/package.json and b/app/package.json differ diff --git a/app/scripts/latex-to-mdx/input/.gitignore b/app/scripts/latex-to-mdx/input/.gitignore deleted file mode 100644 index 3985e18491a1c8bd8442b52c71648788e52af71e..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/.gitignore +++ /dev/null @@ -1,13 +0,0 @@ -.DS_store - -*.aux -*.nav -*.log -*.snm -*.toc -*.out -*.vrb -*.blg -*latexmk* -*fls -*synctex* \ No newline at end of file diff --git a/app/scripts/latex-to-mdx/input/README.md b/app/scripts/latex-to-mdx/input/README.md deleted file mode 100644 index 060c311b294f3eeeed47d7b564e02cb38ad781d5..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/README.md +++ /dev/null @@ -1,64 +0,0 @@ -# Robot Learning: A Tutorial - -Google "robot learning tutorial", and you will spend just as much time skimming through sources as actually learning about robot learning. -This tutorial solves this: a unified entry point to the field of robot learning, presenting the conceptual underpinnings of popular approaches in the field, as well as presenting practical examples of how to use SOTA algorithms in `lerobot`, an open-source library for full-stack robotics. - -# TODO - -```markdown -## 1. Introduction -- [x] 1.1 Motivation -- [x] 1.2 Structure of the Report - -## 2. Classical Robotics -- [x] 2.1 Different kinds of motion -- [x] 2.2 Example: (Planar) Manipulation - - [x] 2.3.1 Adding Feedback Loops -- [x] 2.4 Limitations of Dynamics-based Robotics - -## 3. Robot Learning -- [ ] 3.1 Reinforcement Learning (RL) for Robotics - - [ ] 3.1.1 A (Concise) Introduction to RL -- [ ] 3.2 Model-Free RL for Real-world Robotics - - [ ] 3.2.1 RL in lerobot: sample efficient, data-driven, and real-world - - [ ] 3.2.2 Code Example: HIL-SERL in lerobot -- [ ] 3.3 Limitations of RL in Real-World Robotics: Simulators and Reward Design -- [ ] 3.4 Behavioral Cloning (BC) for Robotics - - [ ] 4.1.1 Leveraging Real-World Demonstrations - - [ ] 4.1.2 Reward-Free Training and Betting on Data - -## 4. Single-Task Policy Architectures -- [ ] 4.2 Action Chunking with Transformers (ACT) - - [ ] 4.2.1 Model Architecture and Training Objectives - - [ ] 4.2.2 Code Example: Use ACT in lerobot -- [ ] 4.3 Diffusion-Based Policy Models - - [ ] 4.3.1 Generative Modeling for Action Sequences - - [ ] 4.3.2 Code Example: Use Diffusion Policy in lerobot - -## 5. Multi-task Policies: Vision-Language-Action (VLA) Models in Robotics -- [ ] 5.1 Multi-task Policies: Vision-Language-Action (VLA) Models in Robotics - - [ ] 5.1.1 Overview of Major Architectures: Pi0, SmolVLA - - [ ] 5.1.2 Practical Implementation: Using VLA in lerobot - -## 6. Some Emerging Directions in Robot Learning -- [ ] 6.1 VLAs Post-Training - - [ ] 6.1.1 From Imitation to Refinement - - [ ] 6.1.2 EXPO - -## 7. Conclusions -``` - -If time permits (vs current TOC): - -- [ ] 3.3 Model-based RL for Robotics - - [ ] 3.3.1 TD-MPC - - [ ] 3.3.2 Code Example: Use TD-MPC in lerobot -- [ ] 3.5 Popular benchmarks in Robot Learning - -- 4.3 Vector-Quantized Behavior Transformer (VQ-BeT) - - [ ] 4.3.1 Model Architecture and Training Objectives - - [ ] 4.3.2 Code Example: Use VQ-BeT in lerobot - -- [ ] 6.1 Using World Models for Robotics - - [ ] 6.1.1 In the architecture: V-JEPA and V-JEPA2 - - [ ] 6.1.2 In the simulation: GENIE diff --git a/app/scripts/latex-to-mdx/input/_minted/62B8750C0ACEBDA39A95140434E540A8.highlight.minted b/app/scripts/latex-to-mdx/input/_minted/62B8750C0ACEBDA39A95140434E540A8.highlight.minted deleted file mode 100644 index 3a28be3ec2ed0ab0e1783d7462c479ab9c7f9950..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/_minted/62B8750C0ACEBDA39A95140434E540A8.highlight.minted +++ /dev/null @@ -1,52 +0,0 @@ -\begin{MintedVerbatim}[commandchars=\\\{\}] -\PYG{k+kn}{import}\PYG{+w}{ }\PYG{n+nn}{torch} -\PYG{k+kn}{from}\PYG{+w}{ }\PYG{n+nn}{lerobot}\PYG{n+nn}{.}\PYG{n+nn}{datasets}\PYG{n+nn}{.}\PYG{n+nn}{lerobot\PYGZus{}dataset}\PYG{+w}{ }\PYG{k+kn}{import} \PYG{n}{LeRobotDataset} -\PYG{k+kn}{from}\PYG{+w}{ }\PYG{n+nn}{lerobot}\PYG{n+nn}{.}\PYG{n+nn}{datasets}\PYG{n+nn}{.}\PYG{n+nn}{streaming\PYGZus{}dataset}\PYG{+w}{ }\PYG{k+kn}{import} \PYG{n}{StreamingLeRobotDataset} - -\PYG{n}{delta\PYGZus{}timestamps} \PYG{o}{=} \PYG{p}{\PYGZob{}} - \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{observation.images.wrist\PYGZus{}camera}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{:} \PYG{p}{[}\PYG{o}{\PYGZhy{}}\PYG{l+m+mf}{0.2}\PYG{p}{,} \PYG{o}{\PYGZhy{}}\PYG{l+m+mf}{0.1}\PYG{p}{,} \PYG{l+m+mf}{0.0}\PYG{p}{]} \PYG{c+c1}{\PYGZsh{} 0.2, and 0.1 seconds *before* each frame} -\PYG{p}{\PYGZcb{}} - -\PYG{c+c1}{\PYGZsh{} Optionally, use StreamingLeRobotDataset to avoid downloading the dataset} -\PYG{n}{dataset} \PYG{o}{=} \PYG{n}{LeRobotDataset}\PYG{p}{(} - \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{lerobot/svla\PYGZus{}so101\PYGZus{}pickplace}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{,} - \PYG{n}{delta\PYGZus{}timestamps}\PYG{o}{=}\PYG{n}{delta\PYGZus{}timestamps} -\PYG{p}{)} - -\PYG{c+c1}{\PYGZsh{} Streams frames from the Hugging Face Hub without loading into memory} -\PYG{n}{streaming\PYGZus{}dataset} \PYG{o}{=} \PYG{n}{StreamingLeRobotDataset}\PYG{p}{(} - \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{lerobot/svla\PYGZus{}so101\PYGZus{}pickplace}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{,} - \PYG{n}{delta\PYGZus{}timestamps}\PYG{o}{=}\PYG{n}{delta\PYGZus{}timestamps} -\PYG{p}{)} - -\PYG{c+c1}{\PYGZsh{} Get the 100th frame in the dataset by } -\PYG{n}{sample} \PYG{o}{=} \PYG{n}{dataset}\PYG{p}{[}\PYG{l+m+mi}{100}\PYG{p}{]} -\PYG{n+nb}{print}\PYG{p}{(}\PYG{n}{sample}\PYG{p}{)} -\PYG{c+c1}{\PYGZsh{} \PYGZob{}} -\PYG{c+c1}{\PYGZsh{} \PYGZsq{}observation.state\PYGZsq{}: tensor([...]), } -\PYG{c+c1}{\PYGZsh{} \PYGZsq{}action\PYGZsq{}: tensor([...]), } -\PYG{c+c1}{\PYGZsh{} \PYGZsq{}observation.images.wrist\PYGZus{}camera\PYGZsq{}: tensor([3, C, H, W]), for delta timesteps} -\PYG{c+c1}{\PYGZsh{} ...} -\PYG{c+c1}{\PYGZsh{} \PYGZcb{}} - -\PYG{n}{batch\PYGZus{}size}\PYG{o}{=}\PYG{l+m+mi}{16} -\PYG{c+c1}{\PYGZsh{} wrap the dataset in a DataLoader to use process it batches for training purposes} -\PYG{n}{data\PYGZus{}loader} \PYG{o}{=} \PYG{n}{torch}\PYG{o}{.}\PYG{n}{utils}\PYG{o}{.}\PYG{n}{data}\PYG{o}{.}\PYG{n}{DataLoader}\PYG{p}{(} - \PYG{n}{dataset}\PYG{p}{,} - \PYG{n}{batch\PYGZus{}size}\PYG{o}{=}\PYG{n}{batch\PYGZus{}size} -\PYG{p}{)} - -\PYG{c+c1}{\PYGZsh{} Iterate over the DataLoader in a training loop} -\PYG{n}{num\PYGZus{}epochs} \PYG{o}{=} \PYG{l+m+mi}{1} -\PYG{n}{device} \PYG{o}{=} \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{cuda}\PYG{l+s+s2}{\PYGZdq{}} \PYG{k}{if} \PYG{n}{torch}\PYG{o}{.}\PYG{n}{cuda}\PYG{o}{.}\PYG{n}{is\PYGZus{}available}\PYG{p}{(}\PYG{p}{)} \PYG{k}{else} \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{cpu}\PYG{l+s+s2}{\PYGZdq{}} - -\PYG{k}{for} \PYG{n}{epoch} \PYG{o+ow}{in} \PYG{n+nb}{range}\PYG{p}{(}\PYG{n}{num\PYGZus{}epochs}\PYG{p}{)}\PYG{p}{:} - \PYG{k}{for} \PYG{n}{batch} \PYG{o+ow}{in} \PYG{n}{data\PYGZus{}loader}\PYG{p}{:} - \PYG{c+c1}{\PYGZsh{} Move data to the appropriate device (e.g., GPU)} - \PYG{n}{observations} \PYG{o}{=} \PYG{n}{batch}\PYG{p}{[}\PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{observation.state}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{]}\PYG{o}{.}\PYG{n}{to}\PYG{p}{(}\PYG{n}{device}\PYG{p}{)} - \PYG{n}{actions} \PYG{o}{=} \PYG{n}{batch}\PYG{p}{[}\PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{action}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{]}\PYG{o}{.}\PYG{n}{to}\PYG{p}{(}\PYG{n}{device}\PYG{p}{)} - \PYG{n}{images} \PYG{o}{=} \PYG{n}{batch}\PYG{p}{[}\PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{observation.images.wrist\PYGZus{}camera}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{]}\PYG{o}{.}\PYG{n}{to}\PYG{p}{(}\PYG{n}{device}\PYG{p}{)} - - \PYG{c+c1}{\PYGZsh{} Next, you can do amazing\PYGZus{}model.forward(batch)} - \PYG{o}{.}\PYG{o}{.}\PYG{o}{.} -\end{MintedVerbatim} diff --git a/app/scripts/latex-to-mdx/input/_minted/_FAD58DE7366495DB4650CFEFAC2FCD61.index.minted b/app/scripts/latex-to-mdx/input/_minted/_FAD58DE7366495DB4650CFEFAC2FCD61.index.minted deleted file mode 100644 index e253d0e92db1eaec96e192d396d3140316074ce2..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/_minted/_FAD58DE7366495DB4650CFEFAC2FCD61.index.minted +++ /dev/null @@ -1,10 +0,0 @@ -{ - "jobname": "main", - "md5": "FAD58DE7366495DB4650CFEFAC2FCD61", - "timestamp": "20250911180655", - "cachefiles": [ - "62B8750C0ACEBDA39A95140434E540A8.highlight.minted", - "_FAD58DE7366495DB4650CFEFAC2FCD61.index.minted", - "colorful.style.minted" - ] -} \ No newline at end of file diff --git a/app/scripts/latex-to-mdx/input/_minted/colorful.style.minted b/app/scripts/latex-to-mdx/input/_minted/colorful.style.minted deleted file mode 100644 index 4afa6efb439608d812561686f7ec40f8010c0a39..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/_minted/colorful.style.minted +++ /dev/null @@ -1,100 +0,0 @@ -\makeatletter -\def\PYG@reset{\let\PYG@it=\relax \let\PYG@bf=\relax% - \let\PYG@ul=\relax \let\PYG@tc=\relax% - \let\PYG@bc=\relax \let\PYG@ff=\relax} -\def\PYG@tok#1{\csname PYG@tok@#1\endcsname} -\def\PYG@toks#1+{\ifx\relax#1\empty\else% - \PYG@tok{#1}\expandafter\PYG@toks\fi} -\def\PYG@do#1{\PYG@bc{\PYG@tc{\PYG@ul{% - \PYG@it{\PYG@bf{\PYG@ff{#1}}}}}}} -\def\PYG#1#2{\PYG@reset\PYG@toks#1+\relax+\PYG@do{#2}} - -\@namedef{PYG@tok@w}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}} -\@namedef{PYG@tok@c}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}} -\@namedef{PYG@tok@cp}{\def\PYG@tc##1{\textcolor[rgb]{0.33,0.47,0.60}{##1}}} -\@namedef{PYG@tok@cs}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.80,0.00,0.00}{##1}}} -\@namedef{PYG@tok@k}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}} -\@namedef{PYG@tok@kp}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.20,0.53}{##1}}} -\@namedef{PYG@tok@kt}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.20,0.20,0.60}{##1}}} -\@namedef{PYG@tok@o}{\def\PYG@tc##1{\textcolor[rgb]{0.20,0.20,0.20}{##1}}} -\@namedef{PYG@tok@ow}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.00}{##1}}} -\@namedef{PYG@tok@nb}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.44,0.13}{##1}}} -\@namedef{PYG@tok@nf}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.40,0.73}{##1}}} -\@namedef{PYG@tok@nc}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.73,0.00,0.40}{##1}}} -\@namedef{PYG@tok@nn}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.05,0.52,0.71}{##1}}} -\@namedef{PYG@tok@ne}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}} -\@namedef{PYG@tok@nv}{\def\PYG@tc##1{\textcolor[rgb]{0.60,0.40,0.20}{##1}}} -\@namedef{PYG@tok@vi}{\def\PYG@tc##1{\textcolor[rgb]{0.20,0.20,0.73}{##1}}} -\@namedef{PYG@tok@vc}{\def\PYG@tc##1{\textcolor[rgb]{0.20,0.40,0.60}{##1}}} -\@namedef{PYG@tok@vg}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.87,0.47,0.00}{##1}}} -\@namedef{PYG@tok@no}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.20,0.40}{##1}}} -\@namedef{PYG@tok@nl}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.60,0.47,0.00}{##1}}} -\@namedef{PYG@tok@ni}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}} -\@namedef{PYG@tok@na}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.80}{##1}}} -\@namedef{PYG@tok@nt}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.47,0.00}{##1}}} -\@namedef{PYG@tok@nd}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.33,0.33,0.33}{##1}}} -\@namedef{PYG@tok@s}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@sc}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}} -\@namedef{PYG@tok@sd}{\def\PYG@tc##1{\textcolor[rgb]{0.87,0.27,0.13}{##1}}} -\@namedef{PYG@tok@si}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{0.93,0.93,0.93}{\strut ##1}}}} -\@namedef{PYG@tok@se}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@sr}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.00}{##1}}\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,1.00}{\strut ##1}}}} -\@namedef{PYG@tok@ss}{\def\PYG@tc##1{\textcolor[rgb]{0.67,0.40,0.00}{##1}}} -\@namedef{PYG@tok@sx}{\def\PYG@tc##1{\textcolor[rgb]{0.87,0.13,0.00}{##1}}\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@m}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.40,0.00,0.93}{##1}}} -\@namedef{PYG@tok@mi}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.87}{##1}}} -\@namedef{PYG@tok@mf}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.40,0.00,0.93}{##1}}} -\@namedef{PYG@tok@mh}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.33,0.53}{##1}}} -\@namedef{PYG@tok@mo}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.27,0.00,0.93}{##1}}} -\@namedef{PYG@tok@gh}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}} -\@namedef{PYG@tok@gu}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}} -\@namedef{PYG@tok@gd}{\def\PYG@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}} -\@namedef{PYG@tok@gi}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.63,0.00}{##1}}} -\@namedef{PYG@tok@gr}{\def\PYG@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}} -\@namedef{PYG@tok@ge}{\let\PYG@it=\textit} -\@namedef{PYG@tok@gs}{\let\PYG@bf=\textbf} -\@namedef{PYG@tok@ges}{\let\PYG@bf=\textbf\let\PYG@it=\textit} -\@namedef{PYG@tok@gp}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.78,0.36,0.04}{##1}}} -\@namedef{PYG@tok@go}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}} -\@namedef{PYG@tok@gt}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}} -\@namedef{PYG@tok@err}{\def\PYG@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.67,0.67}{\strut ##1}}}} -\@namedef{PYG@tok@kc}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}} -\@namedef{PYG@tok@kd}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}} -\@namedef{PYG@tok@kn}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}} -\@namedef{PYG@tok@kr}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}} -\@namedef{PYG@tok@bp}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.44,0.13}{##1}}} -\@namedef{PYG@tok@fm}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.40,0.73}{##1}}} -\@namedef{PYG@tok@vm}{\def\PYG@tc##1{\textcolor[rgb]{0.60,0.40,0.20}{##1}}} -\@namedef{PYG@tok@sa}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@sb}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@dl}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@s2}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@sh}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@s1}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}} -\@namedef{PYG@tok@mb}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.40,0.00,0.93}{##1}}} -\@namedef{PYG@tok@il}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.87}{##1}}} -\@namedef{PYG@tok@ch}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}} -\@namedef{PYG@tok@cm}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}} -\@namedef{PYG@tok@cpf}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}} -\@namedef{PYG@tok@c1}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}} - -\def\PYGZbs{\char`\\} -\def\PYGZus{\char`\_} -\def\PYGZob{\char`\{} -\def\PYGZcb{\char`\}} -\def\PYGZca{\char`\^} -\def\PYGZam{\char`\&} -\def\PYGZlt{\char`\<} -\def\PYGZgt{\char`\>} -\def\PYGZsh{\char`\#} -\def\PYGZpc{\char`\%} -\def\PYGZdl{\char`\$} -\def\PYGZhy{\char`\-} -\def\PYGZsq{\char`\'} -\def\PYGZdq{\char`\"} -\def\PYGZti{\char`\~} -% for compatibility with earlier versions -\def\PYGZat{@} -\def\PYGZlb{[} -\def\PYGZrb{]} -\makeatother diff --git a/app/scripts/latex-to-mdx/input/fancyhdr.sty b/app/scripts/latex-to-mdx/input/fancyhdr.sty deleted file mode 100644 index 77ed4e3012d822c7cca5c17efcae308b32b8cc2b..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/fancyhdr.sty +++ /dev/null @@ -1,485 +0,0 @@ -% fancyhdr.sty version 3.2 -% Fancy headers and footers for LaTeX. -% Piet van Oostrum, -% Dept of Computer and Information Sciences, University of Utrecht, -% Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands -% Telephone: +31 30 2532180. Email: piet@cs.uu.nl -% ======================================================================== -% LICENCE: -% This file may be distributed under the terms of the LaTeX Project Public -% License, as described in lppl.txt in the base LaTeX distribution. -% Either version 1 or, at your option, any later version. -% ======================================================================== -% MODIFICATION HISTORY: -% Sep 16, 1994 -% version 1.4: Correction for use with \reversemargin -% Sep 29, 1994: -% version 1.5: Added the \iftopfloat, \ifbotfloat and \iffloatpage commands -% Oct 4, 1994: -% version 1.6: Reset single spacing in headers/footers for use with -% setspace.sty or doublespace.sty -% Oct 4, 1994: -% version 1.7: changed \let\@mkboth\markboth to -% \def\@mkboth{\protect\markboth} to make it more robust -% Dec 5, 1994: -% version 1.8: corrections for amsbook/amsart: define \@chapapp and (more -% importantly) use the \chapter/sectionmark definitions from ps@headings if -% they exist (which should be true for all standard classes). -% May 31, 1995: -% version 1.9: The proposed \renewcommand{\headrulewidth}{\iffloatpage... -% construction in the doc did not work properly with the fancyplain style. -% June 1, 1995: -% version 1.91: The definition of \@mkboth wasn't restored on subsequent -% \pagestyle{fancy}'s. -% June 1, 1995: -% version 1.92: The sequence \pagestyle{fancyplain} \pagestyle{plain} -% \pagestyle{fancy} would erroneously select the plain version. -% June 1, 1995: -% version 1.93: \fancypagestyle command added. -% Dec 11, 1995: -% version 1.94: suggested by Conrad Hughes -% CJCH, Dec 11, 1995: added \footruleskip to allow control over footrule -% position (old hardcoded value of .3\normalbaselineskip is far too high -% when used with very small footer fonts). -% Jan 31, 1996: -% version 1.95: call \@normalsize in the reset code if that is defined, -% otherwise \normalsize. -% this is to solve a problem with ucthesis.cls, as this doesn't -% define \@currsize. Unfortunately for latex209 calling \normalsize doesn't -% work as this is optimized to do very little, so there \@normalsize should -% be called. Hopefully this code works for all versions of LaTeX known to -% mankind. -% April 25, 1996: -% version 1.96: initialize \headwidth to a magic (negative) value to catch -% most common cases that people change it before calling \pagestyle{fancy}. -% Note it can't be initialized when reading in this file, because -% \textwidth could be changed afterwards. This is quite probable. -% We also switch to \MakeUppercase rather than \uppercase and introduce a -% \nouppercase command for use in headers. and footers. -% May 3, 1996: -% version 1.97: Two changes: -% 1. Undo the change in version 1.8 (using the pagestyle{headings} defaults -% for the chapter and section marks. The current version of amsbook and -% amsart classes don't seem to need them anymore. Moreover the standard -% latex classes don't use \markboth if twoside isn't selected, and this is -% confusing as \leftmark doesn't work as expected. -% 2. include a call to \ps@empty in ps@@fancy. This is to solve a problem -% in the amsbook and amsart classes, that make global changes to \topskip, -% which are reset in \ps@empty. Hopefully this doesn't break other things. -% May 7, 1996: -% version 1.98: -% Added % after the line \def\nouppercase -% May 7, 1996: -% version 1.99: This is the alpha version of fancyhdr 2.0 -% Introduced the new commands \fancyhead, \fancyfoot, and \fancyhf. -% Changed \headrulewidth, \footrulewidth, \footruleskip to -% macros rather than length parameters, In this way they can be -% conditionalized and they don't consume length registers. There is no need -% to have them as length registers unless you want to do calculations with -% them, which is unlikely. Note that this may make some uses of them -% incompatible (i.e. if you have a file that uses \setlength or \xxxx=) -% May 10, 1996: -% version 1.99a: -% Added a few more % signs -% May 10, 1996: -% version 1.99b: -% Changed the syntax of \f@nfor to be resistent to catcode changes of := -% Removed the [1] from the defs of \lhead etc. because the parameter is -% consumed by the \@[xy]lhead etc. macros. -% June 24, 1997: -% version 1.99c: -% corrected \nouppercase to also include the protected form of \MakeUppercase -% \global added to manipulation of \headwidth. -% \iffootnote command added. -% Some comments added about \@fancyhead and \@fancyfoot. -% Aug 24, 1998 -% version 1.99d -% Changed the default \ps@empty to \ps@@empty in order to allow -% \fancypagestyle{empty} redefinition. -% Oct 11, 2000 -% version 2.0 -% Added LPPL license clause. -% -% A check for \headheight is added. An errormessage is given (once) if the -% header is too large. Empty headers don't generate the error even if -% \headheight is very small or even 0pt. -% Warning added for the use of 'E' option when twoside option is not used. -% In this case the 'E' fields will never be used. -% -% Mar 10, 2002 -% version 2.1beta -% New command: \fancyhfoffset[place]{length} -% defines offsets to be applied to the header/footer to let it stick into -% the margins (if length > 0). -% place is like in fancyhead, except that only E,O,L,R can be used. -% This replaces the old calculation based on \headwidth and the marginpar -% area. -% \headwidth will be dynamically calculated in the headers/footers when -% this is used. -% -% Mar 26, 2002 -% version 2.1beta2 -% \fancyhfoffset now also takes h,f as possible letters in the argument to -% allow the header and footer widths to be different. -% New commands \fancyheadoffset and \fancyfootoffset added comparable to -% \fancyhead and \fancyfoot. -% Errormessages and warnings have been made more informative. -% -% Dec 9, 2002 -% version 2.1 -% The defaults for \footrulewidth, \plainheadrulewidth and -% \plainfootrulewidth are changed from \z@skip to 0pt. In this way when -% someone inadvertantly uses \setlength to change any of these, the value -% of \z@skip will not be changed, rather an errormessage will be given. - -% March 3, 2004 -% Release of version 3.0 - -% Oct 7, 2004 -% version 3.1 -% Added '\endlinechar=13' to \fancy@reset to prevent problems with -% includegraphics in header when verbatiminput is active. - -% March 22, 2005 -% version 3.2 -% reset \everypar (the real one) in \fancy@reset because spanish.ldf does -% strange things with \everypar between << and >>. - -\def\ifancy@mpty#1{\def\temp@a{#1}\ifx\temp@a\@empty} - -\def\fancy@def#1#2{\ifancy@mpty{#2}\fancy@gbl\def#1{\leavevmode}\else - \fancy@gbl\def#1{#2\strut}\fi} - -\let\fancy@gbl\global - -\def\@fancyerrmsg#1{% - \ifx\PackageError\undefined - \errmessage{#1}\else - \PackageError{Fancyhdr}{#1}{}\fi} -\def\@fancywarning#1{% - \ifx\PackageWarning\undefined - \errmessage{#1}\else - \PackageWarning{Fancyhdr}{#1}{}\fi} - -% Usage: \@forc \var{charstring}{command to be executed for each char} -% This is similar to LaTeX's \@tfor, but expands the charstring. - -\def\@forc#1#2#3{\expandafter\f@rc\expandafter#1\expandafter{#2}{#3}} -\def\f@rc#1#2#3{\def\temp@ty{#2}\ifx\@empty\temp@ty\else - \f@@rc#1#2\f@@rc{#3}\fi} -\def\f@@rc#1#2#3\f@@rc#4{\def#1{#2}#4\f@rc#1{#3}{#4}} - -% Usage: \f@nfor\name:=list\do{body} -% Like LaTeX's \@for but an empty list is treated as a list with an empty -% element - -\newcommand{\f@nfor}[3]{\edef\@fortmp{#2}% - \expandafter\@forloop#2,\@nil,\@nil\@@#1{#3}} - -% Usage: \def@ult \cs{defaults}{argument} -% sets \cs to the characters from defaults appearing in argument -% or defaults if it would be empty. All characters are lowercased. - -\newcommand\def@ult[3]{% - \edef\temp@a{\lowercase{\edef\noexpand\temp@a{#3}}}\temp@a - \def#1{}% - \@forc\tmpf@ra{#2}% - {\expandafter\if@in\tmpf@ra\temp@a{\edef#1{#1\tmpf@ra}}{}}% - \ifx\@empty#1\def#1{#2}\fi} -% -% \if@in -% -\newcommand{\if@in}[4]{% - \edef\temp@a{#2}\def\temp@b##1#1##2\temp@b{\def\temp@b{##1}}% - \expandafter\temp@b#2#1\temp@b\ifx\temp@a\temp@b #4\else #3\fi} - -\newcommand{\fancyhead}{\@ifnextchar[{\f@ncyhf\fancyhead h}% - {\f@ncyhf\fancyhead h[]}} -\newcommand{\fancyfoot}{\@ifnextchar[{\f@ncyhf\fancyfoot f}% - {\f@ncyhf\fancyfoot f[]}} -\newcommand{\fancyhf}{\@ifnextchar[{\f@ncyhf\fancyhf{}}% - {\f@ncyhf\fancyhf{}[]}} - -% New commands for offsets added - -\newcommand{\fancyheadoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyheadoffset h}% - {\f@ncyhfoffs\fancyheadoffset h[]}} -\newcommand{\fancyfootoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyfootoffset f}% - {\f@ncyhfoffs\fancyfootoffset f[]}} -\newcommand{\fancyhfoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyhfoffset{}}% - {\f@ncyhfoffs\fancyhfoffset{}[]}} - -% The header and footer fields are stored in command sequences with -% names of the form: \f@ncy with for [eo], from [lcr] -% and from [hf]. - -\def\f@ncyhf#1#2[#3]#4{% - \def\temp@c{}% - \@forc\tmpf@ra{#3}% - {\expandafter\if@in\tmpf@ra{eolcrhf,EOLCRHF}% - {}{\edef\temp@c{\temp@c\tmpf@ra}}}% - \ifx\@empty\temp@c\else - \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument: - [#3]}% - \fi - \f@nfor\temp@c{#3}% - {\def@ult\f@@@eo{eo}\temp@c - \if@twoside\else - \if\f@@@eo e\@fancywarning - {\string#1's `E' option without twoside option is useless}\fi\fi - \def@ult\f@@@lcr{lcr}\temp@c - \def@ult\f@@@hf{hf}{#2\temp@c}% - \@forc\f@@eo\f@@@eo - {\@forc\f@@lcr\f@@@lcr - {\@forc\f@@hf\f@@@hf - {\expandafter\fancy@def\csname - f@ncy\f@@eo\f@@lcr\f@@hf\endcsname - {#4}}}}}} - -\def\f@ncyhfoffs#1#2[#3]#4{% - \def\temp@c{}% - \@forc\tmpf@ra{#3}% - {\expandafter\if@in\tmpf@ra{eolrhf,EOLRHF}% - {}{\edef\temp@c{\temp@c\tmpf@ra}}}% - \ifx\@empty\temp@c\else - \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument: - [#3]}% - \fi - \f@nfor\temp@c{#3}% - {\def@ult\f@@@eo{eo}\temp@c - \if@twoside\else - \if\f@@@eo e\@fancywarning - {\string#1's `E' option without twoside option is useless}\fi\fi - \def@ult\f@@@lcr{lr}\temp@c - \def@ult\f@@@hf{hf}{#2\temp@c}% - \@forc\f@@eo\f@@@eo - {\@forc\f@@lcr\f@@@lcr - {\@forc\f@@hf\f@@@hf - {\expandafter\setlength\csname - f@ncyO@\f@@eo\f@@lcr\f@@hf\endcsname - {#4}}}}}% - \fancy@setoffs} - -% Fancyheadings version 1 commands. These are more or less deprecated, -% but they continue to work. - -\newcommand{\lhead}{\@ifnextchar[{\@xlhead}{\@ylhead}} -\def\@xlhead[#1]#2{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#2}} -\def\@ylhead#1{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#1}} - -\newcommand{\chead}{\@ifnextchar[{\@xchead}{\@ychead}} -\def\@xchead[#1]#2{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#2}} -\def\@ychead#1{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#1}} - -\newcommand{\rhead}{\@ifnextchar[{\@xrhead}{\@yrhead}} -\def\@xrhead[#1]#2{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#2}} -\def\@yrhead#1{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#1}} - -\newcommand{\lfoot}{\@ifnextchar[{\@xlfoot}{\@ylfoot}} -\def\@xlfoot[#1]#2{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#2}} -\def\@ylfoot#1{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#1}} - -\newcommand{\cfoot}{\@ifnextchar[{\@xcfoot}{\@ycfoot}} -\def\@xcfoot[#1]#2{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#2}} -\def\@ycfoot#1{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#1}} - -\newcommand{\rfoot}{\@ifnextchar[{\@xrfoot}{\@yrfoot}} -\def\@xrfoot[#1]#2{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#2}} -\def\@yrfoot#1{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#1}} - -\newlength{\fancy@headwidth} -\let\headwidth\fancy@headwidth -\newlength{\f@ncyO@elh} -\newlength{\f@ncyO@erh} -\newlength{\f@ncyO@olh} -\newlength{\f@ncyO@orh} -\newlength{\f@ncyO@elf} -\newlength{\f@ncyO@erf} -\newlength{\f@ncyO@olf} -\newlength{\f@ncyO@orf} -\newcommand{\headrulewidth}{0.4pt} -\newcommand{\footrulewidth}{0pt} -\newcommand{\footruleskip}{.3\normalbaselineskip} - -% Fancyplain stuff shouldn't be used anymore (rather -% \fancypagestyle{plain} should be used), but it must be present for -% compatibility reasons. - -\newcommand{\plainheadrulewidth}{0pt} -\newcommand{\plainfootrulewidth}{0pt} -\newif\if@fancyplain \@fancyplainfalse -\def\fancyplain#1#2{\if@fancyplain#1\else#2\fi} - -\headwidth=-123456789sp %magic constant - -% Command to reset various things in the headers: -% a.o. single spacing (taken from setspace.sty) -% and the catcode of ^^M (so that epsf files in the header work if a -% verbatim crosses a page boundary) -% It also defines a \nouppercase command that disables \uppercase and -% \Makeuppercase. It can only be used in the headers and footers. -\let\fnch@everypar\everypar% save real \everypar because of spanish.ldf -\def\fancy@reset{\fnch@everypar{}\restorecr\endlinechar=13 - \def\baselinestretch{1}% - \def\nouppercase##1{{\let\uppercase\relax\let\MakeUppercase\relax - \expandafter\let\csname MakeUppercase \endcsname\relax##1}}% - \ifx\undefined\@newbaseline% NFSS not present; 2.09 or 2e - \ifx\@normalsize\undefined \normalsize % for ucthesis.cls - \else \@normalsize \fi - \else% NFSS (2.09) present - \@newbaseline% - \fi} - -% Initialization of the head and foot text. - -% The default values still contain \fancyplain for compatibility. -\fancyhf{} % clear all -% lefthead empty on ``plain'' pages, \rightmark on even, \leftmark on odd pages -% evenhead empty on ``plain'' pages, \leftmark on even, \rightmark on odd pages -\if@twoside - \fancyhead[el,or]{\fancyplain{}{\sl\rightmark}} - \fancyhead[er,ol]{\fancyplain{}{\sl\leftmark}} -\else - \fancyhead[l]{\fancyplain{}{\sl\rightmark}} - \fancyhead[r]{\fancyplain{}{\sl\leftmark}} -\fi -\fancyfoot[c]{\rm\thepage} % page number - -% Use box 0 as a temp box and dimen 0 as temp dimen. -% This can be done, because this code will always -% be used inside another box, and therefore the changes are local. - -\def\@fancyvbox#1#2{\setbox0\vbox{#2}\ifdim\ht0>#1\@fancywarning - {\string#1 is too small (\the#1): ^^J Make it at least \the\ht0.^^J - We now make it that large for the rest of the document.^^J - This may cause the page layout to be inconsistent, however\@gobble}% - \dimen0=#1\global\setlength{#1}{\ht0}\ht0=\dimen0\fi - \box0} - -% Put together a header or footer given the left, center and -% right text, fillers at left and right and a rule. -% The \lap commands put the text into an hbox of zero size, -% so overlapping text does not generate an errormessage. -% These macros have 5 parameters: -% 1. LEFTSIDE BEARING % This determines at which side the header will stick -% out. When \fancyhfoffset is used this calculates \headwidth, otherwise -% it is \hss or \relax (after expansion). -% 2. \f@ncyolh, \f@ncyelh, \f@ncyolf or \f@ncyelf. This is the left component. -% 3. \f@ncyoch, \f@ncyech, \f@ncyocf or \f@ncyecf. This is the middle comp. -% 4. \f@ncyorh, \f@ncyerh, \f@ncyorf or \f@ncyerf. This is the right component. -% 5. RIGHTSIDE BEARING. This is always \relax or \hss (after expansion). - -\def\@fancyhead#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset - \@fancyvbox\headheight{\hbox - {\rlap{\parbox[b]{\headwidth}{\raggedright#2}}\hfill - \parbox[b]{\headwidth}{\centering#3}\hfill - \llap{\parbox[b]{\headwidth}{\raggedleft#4}}}\headrule}}#5} - -\def\@fancyfoot#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset - \@fancyvbox\footskip{\footrule - \hbox{\rlap{\parbox[t]{\headwidth}{\raggedright#2}}\hfill - \parbox[t]{\headwidth}{\centering#3}\hfill - \llap{\parbox[t]{\headwidth}{\raggedleft#4}}}}}#5} - -\def\headrule{{\if@fancyplain\let\headrulewidth\plainheadrulewidth\fi - \hrule\@height\headrulewidth\@width\headwidth \vskip-\headrulewidth}} - -\def\footrule{{\if@fancyplain\let\footrulewidth\plainfootrulewidth\fi - \vskip-\footruleskip\vskip-\footrulewidth - \hrule\@width\headwidth\@height\footrulewidth\vskip\footruleskip}} - -\def\ps@fancy{% -\@ifundefined{@chapapp}{\let\@chapapp\chaptername}{}%for amsbook -% -% Define \MakeUppercase for old LaTeXen. -% Note: we used \def rather than \let, so that \let\uppercase\relax (from -% the version 1 documentation) will still work. -% -\@ifundefined{MakeUppercase}{\def\MakeUppercase{\uppercase}}{}% -\@ifundefined{chapter}{\def\sectionmark##1{\markboth -{\MakeUppercase{\ifnum \c@secnumdepth>\z@ - \thesection\hskip 1em\relax \fi ##1}}{}}% -\def\subsectionmark##1{\markright {\ifnum \c@secnumdepth >\@ne - \thesubsection\hskip 1em\relax \fi ##1}}}% -{\def\chaptermark##1{\markboth {\MakeUppercase{\ifnum \c@secnumdepth>\m@ne - \@chapapp\ \thechapter. \ \fi ##1}}{}}% -\def\sectionmark##1{\markright{\MakeUppercase{\ifnum \c@secnumdepth >\z@ - \thesection. \ \fi ##1}}}}% -%\csname ps@headings\endcsname % use \ps@headings defaults if they exist -\ps@@fancy -\gdef\ps@fancy{\@fancyplainfalse\ps@@fancy}% -% Initialize \headwidth if the user didn't -% -\ifdim\headwidth<0sp -% -% This catches the case that \headwidth hasn't been initialized and the -% case that the user added something to \headwidth in the expectation that -% it was initialized to \textwidth. We compensate this now. This loses if -% the user intended to multiply it by a factor. But that case is more -% likely done by saying something like \headwidth=1.2\textwidth. -% The doc says you have to change \headwidth after the first call to -% \pagestyle{fancy}. This code is just to catch the most common cases were -% that requirement is violated. -% - \global\advance\headwidth123456789sp\global\advance\headwidth\textwidth -\fi} -\def\ps@fancyplain{\ps@fancy \let\ps@plain\ps@plain@fancy} -\def\ps@plain@fancy{\@fancyplaintrue\ps@@fancy} -\let\ps@@empty\ps@empty -\def\ps@@fancy{% -\ps@@empty % This is for amsbook/amsart, which do strange things with \topskip -\def\@mkboth{\protect\markboth}% -\def\@oddhead{\@fancyhead\fancy@Oolh\f@ncyolh\f@ncyoch\f@ncyorh\fancy@Oorh}% -\def\@oddfoot{\@fancyfoot\fancy@Oolf\f@ncyolf\f@ncyocf\f@ncyorf\fancy@Oorf}% -\def\@evenhead{\@fancyhead\fancy@Oelh\f@ncyelh\f@ncyech\f@ncyerh\fancy@Oerh}% -\def\@evenfoot{\@fancyfoot\fancy@Oelf\f@ncyelf\f@ncyecf\f@ncyerf\fancy@Oerf}% -} -% Default definitions for compatibility mode: -% These cause the header/footer to take the defined \headwidth as width -% And to shift in the direction of the marginpar area - -\def\fancy@Oolh{\if@reversemargin\hss\else\relax\fi} -\def\fancy@Oorh{\if@reversemargin\relax\else\hss\fi} -\let\fancy@Oelh\fancy@Oorh -\let\fancy@Oerh\fancy@Oolh - -\let\fancy@Oolf\fancy@Oolh -\let\fancy@Oorf\fancy@Oorh -\let\fancy@Oelf\fancy@Oelh -\let\fancy@Oerf\fancy@Oerh - -% New definitions for the use of \fancyhfoffset -% These calculate the \headwidth from \textwidth and the specified offsets. - -\def\fancy@offsolh{\headwidth=\textwidth\advance\headwidth\f@ncyO@olh - \advance\headwidth\f@ncyO@orh\hskip-\f@ncyO@olh} -\def\fancy@offselh{\headwidth=\textwidth\advance\headwidth\f@ncyO@elh - \advance\headwidth\f@ncyO@erh\hskip-\f@ncyO@elh} - -\def\fancy@offsolf{\headwidth=\textwidth\advance\headwidth\f@ncyO@olf - \advance\headwidth\f@ncyO@orf\hskip-\f@ncyO@olf} -\def\fancy@offself{\headwidth=\textwidth\advance\headwidth\f@ncyO@elf - \advance\headwidth\f@ncyO@erf\hskip-\f@ncyO@elf} - -\def\fancy@setoffs{% -% Just in case \let\headwidth\textwidth was used - \fancy@gbl\let\headwidth\fancy@headwidth - \fancy@gbl\let\fancy@Oolh\fancy@offsolh - \fancy@gbl\let\fancy@Oelh\fancy@offselh - \fancy@gbl\let\fancy@Oorh\hss - \fancy@gbl\let\fancy@Oerh\hss - \fancy@gbl\let\fancy@Oolf\fancy@offsolf - \fancy@gbl\let\fancy@Oelf\fancy@offself - \fancy@gbl\let\fancy@Oorf\hss - \fancy@gbl\let\fancy@Oerf\hss} - -\newif\iffootnote -\let\latex@makecol\@makecol -\def\@makecol{\ifvoid\footins\footnotetrue\else\footnotefalse\fi -\let\topfloat\@toplist\let\botfloat\@botlist\latex@makecol} -\def\iftopfloat#1#2{\ifx\topfloat\empty #2\else #1\fi} -\def\ifbotfloat#1#2{\ifx\botfloat\empty #2\else #1\fi} -\def\iffloatpage#1#2{\if@fcolmade #1\else #2\fi} - -\newcommand{\fancypagestyle}[2]{% - \@namedef{ps@#1}{\let\fancy@gbl\relax#2\relax\ps@fancy}} diff --git a/app/scripts/latex-to-mdx/input/figures/ch1/ch1-lerobot-figure1.png b/app/scripts/latex-to-mdx/input/figures/ch1/ch1-lerobot-figure1.png deleted file mode 100644 index 9a43981b7d60df842224ee6bff9be820809b36b6..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch1/ch1-lerobot-figure1.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:a850d2b9170736a42366d65dd858408dcffafa3420a0c6cfd678bbdd29a196fa -size 2861318 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-approaches.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-approaches.png deleted file mode 100644 index 161aac09e5cae1c51d7a24deb2038ad80358e8cb..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-approaches.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:d07f3166fd9efe5b0823ecca63166c019b6fb9dcc912f7b1ae0fd209a25ba274 -size 93262 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-classical-limitations.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-classical-limitations.png deleted file mode 100644 index 969684eb34a3f473e0a0df8ec491c27144d69613..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-classical-limitations.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:85742a774d8d1ad3e36fc50d89c5a69409bce98ebe6bdba734896156ba668aa8 -size 4739243 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-cost-accessibility.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-cost-accessibility.png deleted file mode 100644 index 17aa82045475dc0e0537649285e4abd0a9aefd2b..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-cost-accessibility.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:606cbb89fda90a2ddb22dc721ea978ffa9fe34a7f9f0bf1614b6ae53b4117411 -size 1962263 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-box.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-box.png deleted file mode 100644 index 608b518385558b273d591d7f76d1d2804ece01b8..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-box.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:3c856918ffb061c235d05e74df6310412f5b41ea907f0f12f55fed5c8b45590b -size 93114 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-shelf.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-shelf.png deleted file mode 100644 index 47c539881d7b58df4b4493093ab6b780c349a476..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-shelf.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:e4abb239c45a576a02fc2cbd0d87f877b2c5f61dcac74e1b8c79a70ebacaca3e -size 83589 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor.png deleted file mode 100644 index 1f19ca65db5de85acc43ca8240987b99fd298231..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:4a2c70f2d7c903d9f16433a9ca44c10892fd0e10ca90e2d9b8438c3d25fa623a -size 58946 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-free.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-free.png deleted file mode 100644 index 42d6dc9662903b2563663a9b409a8dc83f69906f..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-free.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:9d860153a76720749a50a6d06c7bcb9886f5605a867f130f66810597ca3f5299 -size 44656 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-platforms.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-platforms.png deleted file mode 100644 index 4ccc153ed092d5493052d1ddede64094ae6b4068..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-platforms.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:baf76deb1a68b859d1e702bc7d0b4173a6b34b56d4bdf75c4748e80eb1934aad -size 3616534 diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-so100-to-planar-manipulator.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-so100-to-planar-manipulator.png deleted file mode 100644 index d4bc70f800df876a10b6fdb4ac51c2544b2977fb..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-so100-to-planar-manipulator.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:731806e912421ee3f3fcd10c24b5f5e9f4dd448f859e8213f8f11c0821fcbf59 -size 1555756 diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-agent-env.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-agent-env.png deleted file mode 100644 index 9d3ac5a9b05c8c48faf8660a5cac80737392110f..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-agent-env.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:43c8641128f72b994a7269561fd6beaf2fbe0d73bb19f58ade559e271de1de31 -size 42614 diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-duck-sim-vs-real.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-duck-sim-vs-real.png deleted file mode 100644 index 142a5ea15f01aee271c1775e26a6a2c7bc4aedcc..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-duck-sim-vs-real.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:c682cfebec3bf21f579a687d4f6a34d6f7cff225397e081188c39ca3b3def1e7 -size 1762155 diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-hil-serl-examples.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-hil-serl-examples.png deleted file mode 100644 index d665f43d5ed8972fc76399ed8caedd9fee4b373e..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-hil-serl-examples.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:ae41b09a8a8412b28994425565438a897f827b3a2048d6832c2be7884b40a2af -size 7216604 diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-atlas.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-atlas.png deleted file mode 100644 index 6aceb0b7ccaefebf0bb854ab012eca0cc3ac5da2..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-atlas.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:124d586210aa9b3a110c712c4eff3629d0064a507c9c77bf937dd00cc959428c -size 178001 diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-benefits.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-benefits.png deleted file mode 100644 index 89684d039e24b897517612c222ef6e979f42a7c2..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-benefits.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:c23f98c050afb75098f34a2bca49fa30ebb4a2b373447c36ba62612854253ff3 -size 6936585 diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-many-ducks.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-many-ducks.png deleted file mode 100644 index 7605bcb2ba0f2abcd7213a4ca092e792db08c504..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-many-ducks.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:418bdeff168978207fcc623db74d25b86d11f27d1100a28238bc1591901b93de -size 4872198 diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-algorithms-atlas.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-algorithms-atlas.png deleted file mode 100644 index 95e818db1704eb52f601c8d5a32f215b7cf7620c..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-algorithms-atlas.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:2aa853e6067e7bd06cfa0d12250d4277fbe2020b8a2b817c005b084c49c905d5 -size 194522 diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-examples.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-examples.png deleted file mode 100644 index 06de5007b9f0c10c23f79a2af13865a701916662..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-examples.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:edb1fa24ee3d279302980016809eab038fc43037156b8d7cadae7fa5b9dddbba -size 9051359 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-decoder.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-decoder.png deleted file mode 100644 index 9a09fcb99bb717287ca74d165a3ca5d6983febba..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-decoder.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:578074c47e65992422e9cb991949b1d63598aded2098dfde3925a33dfd55e481 -size 3180391 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-encoder.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-encoder.png deleted file mode 100644 index f587680a13512bae2fe83b3b472ea54a273293e5..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-encoder.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:7ceeeccb9dd7e791f215f71ee422d9adfb8c2ff1d2417a851e31ba6a6715aaf7 -size 874336 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act.png deleted file mode 100644 index 1f884e4a57994ca4a50e979ce8a7595bd02afc6f..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:318b6f77277c5e8fcf51e2aba63154ee99052e2bcff2af0387fb3cfd1d07cff7 -size 1517348 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-action-vs-observation-distribution.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-action-vs-observation-distribution.png deleted file mode 100644 index fc82dc6c86ce40126b00697f13a43cc563fe4b4d..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-action-vs-observation-distribution.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:7db4ecc0d54d9cab6b8a16017c81bfd9b7fd5d7997bcdd645ccf57167f7efcf2 -size 274240 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-async-inference.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-async-inference.png deleted file mode 100644 index 73aae17126c70f3fca8651ef62b7d519c81e6f58..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-async-inference.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:850ebb6e6ad809edc48597a89cf8e25b2664b9137ca4602ae14f164524f8d232 -size 282300 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-bc-trajectories.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-bc-trajectories.png deleted file mode 100644 index d577a6966244c54eb3738bd61af13232a603145a..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-bc-trajectories.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:0ede85dbb8f12b3cced4dc0e12f97e3713d8432953183840f99e8534998d7f3b -size 2253030 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-policy.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-policy.png deleted file mode 100644 index 56da7917d95a1592faafde62702170fac438f903..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-policy.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:c3cb644c79fd016e77c78bd7fcf185908b18fb127f656003eb577349cfb6da40 -size 2805702 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-robot-actions.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-robot-actions.png deleted file mode 100644 index 43d8ce2193bdaeecb172de160290392aaf4000c0..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-robot-actions.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:a59b816b60a53784127e3dcf0aad612ba14474bde57e1c2b73b670665d1b70ec -size 8927638 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-vs-flowmatching.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-vs-flowmatching.png deleted file mode 100644 index 2f4898e0c4db3a001354cc9a78d40e7537b34359..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-vs-flowmatching.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:aef138f5120025b0bad73788bc8b3af91f27331af3b49bafb09b15037944fa12 -size 189022 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-issues-with-bc.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-issues-with-bc.png deleted file mode 100644 index 789283d5085bae36ebaf062bd157007988e2dd23..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-issues-with-bc.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:7b726d8aa64534e8cbec4a0084fd86e4dfcc0b17685559970006a573dd326459 -size 1560808 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-latent-variable-model.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-latent-variable-model.png deleted file mode 100644 index 62a7ade0557696ee25c61d10ef323ca1ec9bb077..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-latent-variable-model.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:e5b1f48d4dc011d5a20b1d5bccc5cde750f4ffab4b8c48bb5b04529a18aa0390 -size 983775 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-many-latents.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-many-latents.png deleted file mode 100644 index d972eb9694fe47d81d7a5bff66f78edd80c83e57..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-many-latents.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:1f5421aae5c9e9735de598fca1a5c68ef7fd28c8b31112c4675356f6deda9b29 -size 222323 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-normalizing-flows.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-normalizing-flows.png deleted file mode 100644 index cf51b8de51af38c0ea807889d8056d41c524c2d5..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-normalizing-flows.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:51f73d09b35b8ccd5685c6b26f7615f8d6ab3df7d045b2502e9232bfe33beace -size 278482 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-observation-action-mapping.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-observation-action-mapping.png deleted file mode 100644 index 6206870edf17a28bafe36ca0c5631a62b14f5a6a..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-observation-action-mapping.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:f1a4a70971ea4c7cf73c089a70e4bc9dd1b5aba43021016fea8b323ad2642c53 -size 2081981 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-queues.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-queues.png deleted file mode 100644 index c1e912ba8a2d5b254ea9d990ba8dbab491cb22ed..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-queues.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:8d3072c26d0419ee4b19f4ebd10c66e117e113514326eb3e7864057644c305d7 -size 1971787 diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-task-effect-on-pairs.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-task-effect-on-pairs.png deleted file mode 100644 index 6fa47c83e5ba456655b025bd651aea0fc6feeeaa..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-task-effect-on-pairs.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:0423b4760f661afa6b81a896a473a4bfc50737b0ecef76fa75051eb6ccf69896 -size 1186204 diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-generalist-policies-timeline.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-generalist-policies-timeline.png deleted file mode 100644 index d85a308d7665bd9c6fab4b0f59f622b0e1599745..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-generalist-policies-timeline.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:98f0efdb30302f2fd582bbec379007ef3d2188171f0d700014539560b5d29a9f -size 121521 diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-ml-vs-robotics-foundation.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-ml-vs-robotics-foundation.png deleted file mode 100644 index 0327c71faf9a48c757b6a6f3027f7e54cac6f0e7..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-ml-vs-robotics-foundation.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:e858e0c5c2d7246e097c8e048d7c378c0ce20c922e66ceac8db8dbb2c5598e79 -size 3389240 diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0-sampling-timesteps.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0-sampling-timesteps.png deleted file mode 100644 index 84401c9e5468cef66fcd2cdf2014f0c103003c93..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0-sampling-timesteps.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:2c27d0d34e08154b42692d1a3ea142ef7742ab50547211e9b22f16d79d14fbb3 -size 186917 diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0.png deleted file mode 100644 index 4ea364ceb9691e4ea9928caac2ee6a32860a52d3..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:689a7d0a94d116edce122d8c9010aa456ae7d1d816f5684513711d36c94ebb89 -size 1242717 diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-smolvla.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-smolvla.png deleted file mode 100644 index 488341b99047ecfad012127baa3a759354577853..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-smolvla.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:49575d51c64eb320c588673fb9b33d1d0a3de7f6af7165a18c35ffb40af93e7a -size 1333430 diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-trends.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-trends.png deleted file mode 100644 index b399968a1d56a98ce0f4af3d1458cf903a1e1471..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-trends.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:357708ec69852658d69c5f3ec3d9c5805939fdaa0d13150f6777731579db09fe -size 636731 diff --git a/app/scripts/latex-to-mdx/input/figures/misc/lerobot-team.jpeg b/app/scripts/latex-to-mdx/input/figures/misc/lerobot-team.jpeg deleted file mode 100644 index 330c9a79b9751bf86ffe5ce84a9aaac88ac5d7e6..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/figures/misc/lerobot-team.jpeg +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:7b79149533fb8602ee423c91c068100657745045bfd1507a6a61e30d58c65877 -size 170202 diff --git a/app/scripts/latex-to-mdx/input/handles.tex b/app/scripts/latex-to-mdx/input/handles.tex deleted file mode 100644 index 47b267c18598edaa9c272e08fa5dba7b3df72138..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/handles.tex +++ /dev/null @@ -1,52 +0,0 @@ -\definecolor{hf1}{HTML}{FFD220} -\definecolor{hf2}{HTML}{FF8360} -\definecolor{hf3}{HTML}{2D728F} -\definecolor{hf4}{HTML}{B5DFCA} -\definecolor{hf5}{HTML}{BABFD1} - -\newcommand{\highlight}[1]{\textcolor{hf2}{#1}} - -\newcommand{\lerobot}{\texttt{lerobot}} -\newcommand{\FK}{\text{FK}} -\newcommand{\targetvel}{\dot {p}^*} -\newcommand{\targetpos}{p^*} - -\newcommand{\statespace}{\mathcal S} -\newcommand{\actionspace}{\mathcal A} -\newcommand{\obsspace}{\mathcal O} -\newcommand{\dynamics}{\mathcal D} -\newcommand{\stateplusone}{s_{t+1}} -\newcommand{\state}{s_t} -\newcommand{\action}{a_t} -\newcommand{\transition}{(\state, \action, \stateplusone)} -\newcommand{\sars}{(\state, \action, r_t, \stateplusone)} -\newcommand{\transitiongiven}{(\stateplusone \vert \state, \action)} -\newcommand{\transitionprob}{\mathbb P \transitiongiven} -\newcommand{\trajectory}{(s_0, a_0, r_0, s_1, a_1, r_1, \dots, s_{T-1}, a_{T-1}, r_{T-1}, s_T)} -\newcommand{\Jpi}{J (\pi_\theta) } -\newcommand{\qfunction}{\(Q\)-function} -\newcommand{\qopt}{\( Q^* \)} - -\newcommand{\supp}[1]{\text{supp}({#1})} -\newcommand{\DKL}{\text{D}_{\text{KL}}} - -\newcommand{\actionchunk}{\mathbf{A}} -\newcommand{\actionexpert}{\mathbf{v}_\theta} -\newcommand{\pizero}{\( \pi_0 \)} - -% TL;DR boxes at the beginning of each chapter -\newtcolorbox{callout}[2][]{ - enhanced, breakable, - colback=hfbackground, opacityback=0.85, - colframe=ai2accent, boxrule=0.6pt, arc=2mm, - left=8pt, right=8pt, top=8pt, bottom=8pt, - before skip=10pt, after skip=10pt, - fonttitle=\sffamily\bfseries, - title={\sffamily\bfseries #2}, - #1 -} - -% Convenience environment for TL;DR -\newenvironment{tldr}[1][TL;DR]{\begin{callout}{#1}}{\end{callout}} - -\newcommand{\lerobotdataset}{\texttt{LeRobotDataset}} \ No newline at end of file diff --git a/app/scripts/latex-to-mdx/input/hfstyle/defns.tex b/app/scripts/latex-to-mdx/input/hfstyle/defns.tex deleted file mode 100644 index 747abce77e91abdb7683af5b9e9974aef3a1462a..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/hfstyle/defns.tex +++ /dev/null @@ -1,502 +0,0 @@ -% -% A useful set of commands -\usepackage{mathtools} -\usepackage{dsfont} -\usepackage[dvipsnames]{xcolor} -\usepackage[colorinlistoftodos]{todonotes} -\usepackage{booktabs} -\usepackage{xfrac} -\usepackage{bbm} - -\usepackage{algpseudocode} -\usepackage{algorithm} -\usepackage{algorithmicx} - -\usepackage[most]{tcolorbox} -\usepackage{xparse} -\usepackage{lipsum} -\usepackage{changepage} -\usepackage{enumitem} - -\newcommand{\qedwhite}{\hfill \ensuremath{\Box}} -%http://www-db.stanford.edu/~manku/latex.html -%The itemize environment can be replaced by: -\newcommand{\squishlist}{ - \begin{list}{$\bullet$} - { \setlength{\itemsep}{0pt} \setlength{\parsep}{3pt} - \setlength{\topsep}{3pt} \setlength{\partopsep}{0pt} - \setlength{\leftmargin}{1.5em} \setlength{\labelwidth}{1em} - \setlength{\labelsep}{0.5em} } } - -\newcommand{\squishlisttwo}{ - \begin{list}{$\bullet$} - { \setlength{\itemsep}{0pt} \setlength{\parsep}{0pt} - \setlength{\topsep}{0pt} \setlength{\partopsep}{0pt} - \setlength{\leftmargin}{2em} \setlength{\labelwidth}{1.5em} - \setlength{\labelsep}{0.5em} } } - -\newcommand{\squishend}{ - \end{list} } - -%Example usage: \squishlist %% \begin{itemize} -%\item First item -%\item Second item -%\squishend %% \end{itemize} - -\newcommand{\denselist}{\itemsep 0pt\topsep-6pt\partopsep-6pt} - - -\newtheorem{thm}{Theorem}[section] -\newtheorem{cor}{Corollary}[section] -\newtheorem{defn}{Definition}[section] -\newenvironment{mythm}{{\bf Theorem}}{} - - - -\newcommand{\tm}{\tilde{m}} -\newcommand{\tv}{\tilde{v}} - -% For bold vector symbols -\newcommand{\myvec}[1]{\boldsymbol{#1}} -\newcommand{\myvecsym}[1]{\boldsymbol{#1}} -\newcommand{\ind}[1]{\mathbb{I}(#1)} - -\newcommand{\vzero}{\myvecsym{0}} -\newcommand{\vone}{\myvecsym{1}} -\newcommand{\valpha}{\myvecsym{\alpha}} -\newcommand{\vbeta}{\myvecsym{\beta}} -\newcommand{\vchi}{\myvecsym{\chi}} -\newcommand{\vdelta}{\myvecsym{\delta}} -\newcommand{\vDelta}{\myvecsym{\Delta}} -\newcommand{\vepsilon}{\myvecsym{\epsilon}} -\newcommand{\vell}{\myvecsym{\ell}} -\newcommand{\veta}{\myvecsym{\eta}} -\newcommand{\vgamma}{\myvecsym{\gamma}} -\newcommand{\vGamma}{\myvecsym{\Gamma}} -\newcommand{\vmu}{\myvecsym{\mu}} -\newcommand{\vnu}{\myvecsym{\nu}} -\newcommand{\vkappa}{\myvecsym{\kappa}} -\newcommand{\vlambda}{\myvecsym{\lambda}} -\newcommand{\vLambda}{\myvecsym{\Lambda}} -\newcommand{\vLambdaBar}{\overline{\vLambda}} -\newcommand{\vomega}{\myvecsym{\omega}} -\newcommand{\vOmega}{\myvecsym{\Omega}} -\newcommand{\vphi}{\myvecsym{\phi}} -\newcommand{\vPhi}{\myvecsym{\Phi}} -\newcommand{\vpi}{\myvecsym{\pi}} -\newcommand{\vpsi}{\myvecsym{\psi}} -\newcommand{\vPsi}{\myvecsym{\Psi}} -\newcommand{\vtheta}{\myvecsym{\theta}} -\newcommand{\vTheta}{\myvecsym{\Theta}} -\newcommand{\vsigma}{\myvecsym{\sigma}} -\newcommand{\vSigma}{\myvecsym{\Sigma}} -\newcommand{\vtau}{\myvecsym{\tau}} -\newcommand{\vupsilon}{\myvecsym{\upsilon}} -\newcommand{\vxi}{\myvecsym{\xi}} - -\newcommand{\vxn}{\vx^{(n)}} - -\newcommand{\vmuY}{\vb} -\newcommand{\vmuMu}{\vmu_{x}} -\newcommand{\vmuMuGivenY}{\vmu_{x|y}} -\newcommand{\vSigmaMu}{\vSigma_{x}} -\newcommand{\vSigmaMuInv}{\vSigma_{x}^{-1}} -\newcommand{\vSigmaMuGivenY}{\vSigma_{x|y}} -\newcommand{\vSigmaMuGivenYinv}{\vSigma_{x|y}^{-1}} -\newcommand{\vSigmaY}{\vSigma_{y}} -\newcommand{\vSigmaYinv}{\vSigma_{y}^{-1}} - -%\newcommand{\vmuY}{\vmu_{y}} -%\newcommand{\vmuMu}{\vmu_{\mu}} -%\newcommand{\vmuMuGivenY}{\vmu_{\mu|y}} -%\newcommand{\vSigmaMu}{\vSigma_{\mu}} -%\newcommand{\vSigmaMuInv}{\vSigma_{\mu}^{-1}} -%\newcommand{\vSigmaMuGivenY}{\vSigma_{\mu|y}} -%\newcommand{\vSigmaMuGivenYinv}{\vSigma_{\mu|y}^{-1}} -%\newcommand{\vSigmaY}{\vSigma_{y}} -%\newcommand{\vSigmaYinv}{\vSigma_{y}^{-1}} - -\newcommand{\muY}{\mu_{y}} -\newcommand{\muMu}{\mu_{\mu}} -\newcommand{\muMuGivenY}{\mu_{\mu|y}} -\newcommand{\SigmaMu}{\Sigma_{\mu}} -\newcommand{\SigmaMuInv}{\Sigma_{\mu}^{-1}} -\newcommand{\SigmaMuGivenY}{\Sigma_{\mu|y}} -\newcommand{\SigmaMuGivenYinv}{\Sigma_{\mu|y}^{-1}} -\newcommand{\SigmaY}{\Sigma_{y}} -\newcommand{\SigmaYinv}{\Sigma_{y}^{-1}} - -\newcommand{\hatf}{\hat{f}} -\newcommand{\haty}{\hat{y}} -\newcommand{\const}{\mbox{const}} -\newcommand{\sigmoid}{\mbox{sigm}} - -\newcommand{\one}{(1)} -\newcommand{\two}{(2)} - -\newcommand{\va}{\myvec{a}} -\newcommand{\vb}{\myvec{b}} -\newcommand{\vc}{\myvec{c}} -\newcommand{\vd}{\myvec{d}} -\newcommand{\ve}{\myvec{e}} -\newcommand{\vf}{\myvec{f}} -\newcommand{\vg}{\myvec{g}} -\newcommand{\vh}{\myvec{h}} -\newcommand{\vj}{\myvec{j}} -\newcommand{\vk}{\myvec{k}} -\newcommand{\vl}{\myvec{l}} -\newcommand{\vm}{\myvec{m}} -\newcommand{\vn}{\myvec{n}} -\newcommand{\vo}{\myvec{o}} -\newcommand{\vp}{\myvec{p}} -\newcommand{\vq}{\myvec{q}} -\newcommand{\vr}{\myvec{r}} -\newcommand{\vs}{\myvec{s}} -\newcommand{\vt}{\myvec{t}} -\newcommand{\vu}{\myvec{u}} -\newcommand{\vv}{\myvec{v}} -\newcommand{\vw}{\myvec{w}} -\newcommand{\vws}{\vw_s} -\newcommand{\vwh}{\hat{\vw}} -\newcommand{\vx}{\myvec{x}} -%\newcommand{\vx}{\myvec{x}} -\newcommand{\vxt}{\myvec{\tilde{x}}} -\newcommand{\vy}{\myvec{y}} -\newcommand{\vyt}{\myvec{\tilde{y}}} -\newcommand{\vz}{\myvec{z}} - -\newcommand{\vA}{\myvec{A}} -\newcommand{\vB}{\myvec{B}} -\newcommand{\vC}{\myvec{C}} -\newcommand{\vD}{\myvec{D}} -\newcommand{\vE}{\myvec{E}} -\newcommand{\vF}{\myvec{F}} -\newcommand{\vG}{\myvec{G}} -\newcommand{\vH}{\myvec{H}} -\newcommand{\vI}{\myvec{I}} -\newcommand{\vJ}{\myvec{J}} -\newcommand{\vK}{\myvec{K}} -\newcommand{\vL}{\myvec{L}} -\newcommand{\vM}{\myvec{M}} -\newcommand{\vN}{\myvec{N}} -\newcommand{\vO}{\myvec{O}} -\newcommand{\vP}{\myvec{P}} -\newcommand{\vQ}{\myvec{Q}} -\newcommand{\vR}{\myvec{R}} -\newcommand{\vS}{\myvec{S}} -\newcommand{\vT}{\myvec{T}} -\newcommand{\vU}{\myvec{U}} -\newcommand{\vV}{\myvec{V}} -\newcommand{\vW}{\myvec{W}} -\newcommand{\vX}{\myvec{X}} -%\newcommand{\vXs}{\vX_{\vs}} -\newcommand{\vXs}{\vX_{s}} -\newcommand{\vXt}{\myvec{\tilde{X}}} -\newcommand{\vY}{\myvec{Y}} -\newcommand{\vZ}{\myvec{Z}} - - -\newcommand{\vxtest}{\myvec{x}_*} -\newcommand{\vytest}{\myvec{y}_*} - - -\newcommand{\ftrue}{f_{true}} - -\newcommand{\myprec}{\mbox{prec}} -\newcommand{\precw}{\lambda_{w}} % precision of weights (alpha) -\newcommand{\precy}{\lambda_{y}} % precision of y (beta) -\newcommand{\fbar}{\overline{f}} -\newcommand{\xmybar}{\overline{x}} -\newcommand{\ybar}{\overline{y}} -\newcommand{\zbar}{\overline{z}} -\newcommand{\vxbar}{\overline{\vx}} -\newcommand{\vXbar}{\overline{\vX}} -\newcommand{\vybar}{\overline{\vy}} -\newcommand{\vYbar}{\overline{\vY}} -\newcommand{\vzbar}{\overline{\vz}} -\newcommand{\vZbar}{\overline{\vZ}} -\newcommand{\xbar}{\overline{x}} -\newcommand{\Xbar}{\overline{X}} -\newcommand{\Ybar}{\overline{Y}} -\newcommand{\Gbar}{\overline{G}} -\newcommand{\Jbar}{\overline{J}} -\newcommand{\Lbar}{\overline{L}} -\newcommand{\Nbar}{\overline{N}} -%\newcommand{\Qbar}{\overline{Q}} -\newcommand{\Qbar}{\overline{Q}} -\newcommand{\Tbar}{\overline{T}} -\newcommand{\Sbar}{\overline{S}} -\newcommand{\vSbar}{\overline{\vS}} -\newcommand{\Rbar}{\overline{R}} - -\newcommand{\vtaubar}{\overline{\vtau}} -\newcommand{\vtbar}{\overline{\vt}} -\newcommand{\vsbar}{\overline{\vs}} - -\newcommand{\htilde}{\tilde{h}} -\newcommand{\vhtilde}{\tilde{\vh}} -\newcommand{\Dtilde}{\tilde{D}} -\newcommand{\Ftilde}{\tilde{F}} -\newcommand{\wtilde}{\tilde{w}} -\newcommand{\ptilde}{\tilde{p}} -\newcommand{\pemp}{p_{emp}} -\newcommand{\pstar}{p^*} -\newcommand{\xtilde}{\tilde{x}} -\newcommand{\Xtilde}{\tilde{X}} -\newcommand{\ytilde}{\tilde{y}} -\newcommand{\Ytilde}{\tilde{Y}} -\newcommand{\vxtilde}{\tilde{\vx}} -\newcommand{\vytilde}{\tilde{\vy}} -\newcommand{\ztilde}{\tilde{\z}} -\newcommand{\vztilde}{\tilde{\vz}} -\newcommand{\vthetaMAP}{\hat{\vtheta}_{MAP}} -\newcommand{\vthetaS}{\vtheta^{(s)}} -\newcommand{\vthetahat}{\hat{\vtheta}} -\newcommand{\thetahat}{\hat{\theta}} -\newcommand{\thetabar}{\overline{\theta}} -\newcommand{\vthetabar}{\overline{\vtheta}} -\newcommand{\pibar}{\overline{\pi}} -\newcommand{\vpibar}{\overline{\vpi}} - - -%\newcommand{\subsubsubsection}[1]{\paragraph{#1}} -\newcommand{\choice}[2]{\left(\!\!\! \begin{array}{c} #1 \\ #2\end{array} \!\!\!\right)} -\newcommand{\half}{\frac{1}{2}} -\newcommand{\defeq}{\stackrel{\rm def}{=}} -\newcommand{\real}{\mathbb{R}} - -\newcommand{\given}{\|} -\newcommand{\indep}[2]{{#1} \perp {#2}} -\newcommand{\condindep}[3]{{#1} \perp {#2} | {#3}} -\newcommand{\condindepG}[3]{{#1} \perp_G {#2} | {#3}} -\newcommand{\condindepP}[3]{{#1} \perp_p {#2} | {#3}} -\newcommand{\depend}[2]{{#1} \not \perp {#2}} -\newcommand{\conddepend}[3]{{#1} \not \perp {#2} | {#3}} - -\newcommand{\trans}[1]{{#1}^{\mathtt{T}}} -\newcommand{\inv}[1]{{#1}^{-1}} - -\newcommand{\ra}{\rightarrow} -\newcommand{\lra}{\leftrightarrow} -\newcommand{\Ra}{\Rightarrow} -%\newcommand{\rv}{r.v.} -\newcommand{\la}{\leftarrow} -\newcommand{\tr}{\mbox{tr}} -\newcommand{\st}{\mbox{ s.t. }} - -\newcommand{\dom}{\mbox{dom}} -\newcommand{\bel}{\mbox{bel}} -\newcommand{\dsep}{\mbox{dsep}} -\newcommand{\sep}{\mbox{sep}} -\newcommand{\entails}{\models} -\newcommand{\range}{\mbox{range}} -\newcommand{\myspan}{\mbox{span}} -\newcommand{\nullspace}{\mbox{nullspace}} -\newcommand{\adj}{\mbox{adj}} - -\newcommand{\nbd}{\mbox{nbd}} -\newcommand{\nbr}{\mbox{nbr}} -\newcommand{\anc}{\mbox{anc}} -\newcommand{\desc}{\mbox{desc}} -\newcommand{\pred}{\mbox{pred}} -\newcommand{\nondesc}{\mbox{nondesc}} -\newcommand{\pa}{\pi} -\newcommand{\ch}{\mbox{ch}} -\newcommand{\mb}{\mbox{mb}} -\newcommand{\connects}{\sim} - - -\newcommand{\betadist}{\mbox{Beta}} -\newcommand{\Betadist}{\mbox{Beta}} -\newcommand{\bernoulli}{\mbox{Ber}} -\newcommand{\Ber}{\mbox{Ber}} -\newcommand{\Binom}{\mbox{Bin}} -\newcommand{\NegBinom}{\mbox{NegBinom}} -\newcommand{\binomdist}{\mbox{Bin}} -\newcommand{\cauchy}{\mbox{Cauchy}} -\newcommand{\DE}{\mbox{DE}} -\newcommand{\Dir}{\mbox{Dir}} -\newcommand{\discrete}{\calM} -\newcommand{\Discrete}{\calM} -\newcommand{\expdist}{\mbox{Exp}} -\newcommand{\expon}{\mbox{Expon}} -\newcommand{\gammadist}{\mbox{Ga}} -\newcommand{\Ga}{\mbox{Ga}} -\newcommand{\gauss}{{\cal N}} -\newcommand{\IG}{\mbox{IG}} -\newcommand{\IGauss}{\mbox{InvGauss}} -\newcommand{\IW}{\mbox{IW}} -\newcommand{\Laplace}{\mbox{Lap}} -\newcommand{\Mu}{\mbox{Mu}} -\newcommand{\Multi}{\mbox{Mu}} -\newcommand{\NIX}{NI\chi^2} -\newcommand{\GIX}{NI\chi^2} -\newcommand{\NIG}{\mbox{NIG}} -\newcommand{\GIG}{\mbox{NIG}} -\newcommand{\NIW}{\mbox{NIW}} -\newcommand{\GIW}{\mbox{NIW}} -\newcommand{\MVNIW}{\mbox{NIW}} -\newcommand{\NW}{\mbox{NWi}} -\newcommand{\MVNIG}{\mbox{NIG}} -\newcommand{\NGdist}{\mbox{NG}} -\newcommand{\prob}{p} -\newcommand{\Poi}{\mbox{Poi}} -\newcommand{\Student}{{\cal T}} -\newcommand{\student}{{\cal T}} -\newcommand{\Wishart}{\mbox{Wi}} -\newcommand{\Wi}{\mbox{Wi}} -\newcommand{\unif}{\mbox{U}} -\newcommand{\etr}{\mbox{etr}} - -\newcommand{\softmax}{\calS} -\newcommand{\soft}{\mbox{soft}} -\newcommand{\cond}{\mbox{cond}} -\newcommand{\sign}{\mbox{sign}} -\newcommand{\sgn}{\mbox{sgn}} -\newcommand{\iid}{\mbox{iid}} -\newcommand{\mle}{\mbox{mle}} -\newcommand{\myiff}{\mbox{iff}} -\newcommand{\pd}{\mbox{pd}} -\newcommand{\pdf}{\mbox{pdf }} -\newcommand{\cdf}{\mbox{cdf}} -\newcommand{\pmf}{\mbox{pmf}} -\newcommand{\wrt}{\mbox{wrt}} -\newcommand{\matlab}{{\sc MATLAB}} -\newcommand{\NETLAB}{{\sc NETLAB}} -\newcommand{\MLABA}{\mbox{PMTK}} -\newcommand{\BLT}{\mbox{PMTK}} -\newcommand{\PMTK}{\mbox{PMTK}} -\newcommand{\mywp}{\mbox{wp}} - -\newcommand{\KLpq}[2]{\mathrm{KL}\left[{#1}\|{#2}\right]} -\newcommand{\KL}{\mbox{KL}} -\newcommand{\MI}{\mathbb{I}} -\newcommand{\MIxy}[2]{\mathbb{I}\left({#1};{#2}\right)} -\newcommand{\MIxyz}[3]{\mathbb{I}\left({#1};{#2}|{#3}\right)} -\newcommand{\entropy}[1]{\mathbb{H}\left({#1}\right)} -\newcommand{\entropypq}[2]{\mathbb{H}\left({#1}, {#2}\right)} - - -\newcommand{\vvec}{\mbox{vec}} -\newcommand{\kron}{\otimes} -\newcommand{\dof}{\mbox{dof}} -%\newcommand{\E}{E} -\newcommand{\E}{\mathbb{E}} -\newcommand{\energy}{E} -\newcommand{\expectAngle}[1]{\langle #1 \rangle} -%\newcommand{\expect}[1]{\mathbb{E}\left[ {#1} \right]} -\newcommand{\expect}[2]{\mathds{E}_{{#1}} \left[ {#2} \right]} -\newcommand{\expectGiven}[3]{\mathds{E}_{{#1}} \left[ {#2} \mid {#3} \right]} -\newcommand{\Var}{\mbox{Var}} -\newcommand{\VarGiven}[3]{\mbox{Var}_{{#1}}\left[ {#2} \mid {#3}\right]} -%\newcommand{\Var}{\mathbb{V}} -\newcommand{\var}[1]{\mbox{var}\left[{#1}\right]} -\newcommand{\std}[1]{\mbox{std}\left[{#1}\right]} -\newcommand{\varQ}[2]{\mbox{var}_{{#2}}\left[{#1}\right]} -\newcommand{\cov}[1]{\mbox{cov}\left[{#1}\right]} -%\newcommand{\mode}[1]{\mbox{mode}\left[{#1}\right]} -\newcommand{\median}[1]{\mbox{median}\left[{#1}\right]} - - - -\newcommand{\diag}{\mbox{diag}} -\newcommand{\blkdiag}{\mbox{blkdiag}} -\newcommand{\bias}{\mbox{bias}} -\newcommand{\union}{\cup} -\newcommand{\intersect}{\cap} - -\newcommand{\size}{\mbox{size}} -\newcommand{\trace}{\mbox{trace}} - - -\newcommand{\myc}{c} -\newcommand{\myi}{i} -\newcommand{\myj}{j} -\newcommand{\myk}{k} -\newcommand{\myn}{n} -\newcommand{\myq}{q} -\newcommand{\mys}{s} -\newcommand{\myt}{t} - - -\newcommand{\supp}{\mbox{supp}} - - -\newcommand{\calA}{{\cal A}} -\newcommand{\calB}{{\cal B}} -\newcommand{\calC}{{\cal C}} -\newcommand{\calD}{{\cal D}} -\newcommand{\calDx}{{\cal D}_x} -\newcommand{\calE}{{\cal E}} -\newcommand{\cale}{{\cal e}} -\newcommand{\calF}{{\cal F}} -\newcommand{\calG}{{\cal G}} -\newcommand{\calH}{{\cal H}} -\newcommand{\calHX}{{\cal H}_X} -\newcommand{\calHy}{{\cal H}_y} -\newcommand{\calI}{{\cal I}} -\newcommand{\calK}{{\cal K}} -\newcommand{\calM}{{\cal M}} -\newcommand{\calN}{{\cal N}} -\newcommand{\caln}{{\cal n}} -\newcommand{\calNP}{{\cal NP}} -\newcommand{\calMp}{\calM^+} -\newcommand{\calMm}{\calM^-} -\newcommand{\calMo}{\calM^o} -\newcommand{\Ctest}{C_*} -\newcommand{\calL}{{\cal L}} -\newcommand{\calP}{{\cal P}} -\newcommand{\calq}{{\cal q}} -\newcommand{\calQ}{{\cal Q}} -\newcommand{\calR}{{\cal R}} -\newcommand{\calS}{{\cal S}} -\newcommand{\calSstar}{\calS_*} -\newcommand{\calT}{{\cal T}} -\newcommand{\calV}{{\cal V}} -\newcommand{\calv}{{\cal v}} -\newcommand{\calX}{{\cal X}} -\newcommand{\calY}{{\cal Y}} - -\newcommand{\Lone}{$\ell_1$} -\newcommand{\Ltwo}{$\ell_2$} - -\newcommand{\mya}{\mbox{a}} -\newcommand{\myat}{\alpha_{t|t-1}} -\newcommand{\score}{\mbox{score}} -\newcommand{\AIC}{\mbox{AIC}} -\newcommand{\BIC}{\mbox{BIC}} -\newcommand{\BICcost}{\mbox{BIC-cost}} -\newcommand{\scoreBIC}{\mbox{score-BIC}} -\newcommand{\scoreBICL}{\mbox{score-BIC-L1}} -\newcommand{\scoreL}{\mbox{score-L1}} - -\newcommand{\ecoli}{\mbox{{\it E. coli}}} -\newcommand{\doPearl}{\mbox{do}} -\newcommand{\data}{\calD} -\newcommand{\model}{\calM} -\newcommand{\dataTrain}{\calD_{\mbox{train}}} -\newcommand{\dataTest}{\calD_{\mbox{test}}} -\newcommand{\dataValid}{\calD_{\mbox{valid}}} -\newcommand{\futuredata}{\tilde{\calD}} -\newcommand{\algo}{\calA} -\newcommand{\fitAlgo}{\calF} -\newcommand{\predictAlgo}{\calP} -\newcommand{\err}{\mbox{err}} -\newcommand{\logit}{\mbox{logit}} -\newcommand{\parent}{\mbox{pa}} - - -%%%%%%%%%%% Hoyt - -\newcommand{\conv}[1]{\,\,\,\displaystyle{\operatorname*{\longrightarrow}^{\,_{#1}\,}}\,\,\,} -\newcommand{\dconv}{\conv{D}} -\newcommand{\pconv}{\conv{P}} -\newcommand{\asconv}{\conv{AS}} -\newcommand{\lpconv}[1]{\conv{L^{#1}}} - -\DeclareMathAlphabet{\mathpzc}{OT1}{pzc}{m}{n} - - -\newcommand{\condSet}{\mathcal{S}} -\newcommand{\condSetC}{\mathcal{\lnot S}} - diff --git a/app/scripts/latex-to-mdx/input/hfstyle/hf.cls b/app/scripts/latex-to-mdx/input/hfstyle/hf.cls deleted file mode 100644 index 4c654160d8711db05b1010300542e28a6396008e..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/hfstyle/hf.cls +++ /dev/null @@ -1,360 +0,0 @@ -% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% A style for AI2 pre-prints -% Author: jacobm@allenai.org -% Version: 1.1 -% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -% Class declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\NeedsTeXFormat{LaTeX2e} -\ProvidesClass{hfstyle/hf} -\LoadClassWithOptions{article} - -% Layout %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\RequirePackage[top=2.25cm, bottom=2.5cm, left=2.5cm, right=2.5cm, columnsep=0.65cm, margin=1.9cm]{geometry} -\RequirePackage{microtype} -\RequirePackage{placeins} -\RequirePackage{hyphenat} -\RequirePackage{setspace} -\RequirePackage{parskip} -\RequirePackage[latin, english]{babel} -\RequirePackage{lipsum} -\RequirePackage{etoolbox} -\RequirePackage{fancyhdr} % custom headers/footers - -% \DisableLigatures[f]{family=sf*} - -% Graphics %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\RequirePackage{graphicx} -\RequirePackage{subcaption} - -% Tables %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\RequirePackage{booktabs} -\RequirePackage{nicematrix} -\RequirePackage{multirow} -\RequirePackage{bm} -\newcommand{\nm}[1]{#1} - -% Colorful stuff %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\RequirePackageWithOptions{xcolor} -\RequirePackage[most]{tcolorbox} -\definecolor{ai2accent}{HTML}{407579} -% \definecolor{ai2accent}{HTML}{ff0000} -\definecolor{hfforeground}{HTML}{1C2B33} -\definecolor{hfbackground}{HTML}{ffffb7} -\definecolor{hfforegroundDark}{HTML}{0A2B35} -\definecolor{ai2pink}{HTML}{F0529C} -\definecolor{hfyellow}{HTML}{000000} - - -% References %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\RequirePackage{hyperref} -\hypersetup{ - colorlinks=true, - linkcolor=ai2accent, - citecolor=ai2accent, - urlcolor=ai2accent, - anchorcolor=ai2accent, - menucolor=ai2accent, - filecolor=ai2accent, - % linktocpage=true, - allcolors=ai2accent -} - - -\RequirePackage[noabbrev,nameinlink]{cleveref} -% Reapply hyperref settings after cleveref to ensure they stick -\AtBeginDocument{ - \hypersetup{ - allcolors=ai2accent, - linkcolor=ai2accent, - citecolor=ai2accent, - urlcolor=ai2accent - } -} - - -% change base color of text -\AtBeginDocument{ - \color{hfforegroundDark} - \pagecolor{white} -} - - -\RequirePackage[round,authoryear]{natbib} -\def\bibfont{\small} - - -% Create a custom size that's exactly 1pt larger than normalsize -\newcommand{\slightlylarger}{% - \fontsize{\dimexpr\f@size pt+1}{\dimexpr\f@size pt-0.2\baselineskip}\selectfont% -} - -% Create a custom size that's exactly 1pt smaller than normalsize -\newcommand{\slightlysmaller}{% - \fontsize{\dimexpr\f@size pt-1}{\dimexpr\f@size pt+0.2\baselineskip}\selectfont% -} - -% Section and caption format %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\RequirePackage{titlesec} -% \titleformat*{\paragraph}{\bfseries} -\titleformat*{\section}{\Large\sffamily\bfseries} -\titleformat*{\subsection}{\large\sffamily\bfseries} -\titleformat*{\subsubsection}{\slightlylarger\sffamily\bfseries} -\titleformat*{\paragraph}{\slightlysmaller\sffamily\bfseries} - -% make bolded text smaller to match with serif. -% \DeclareTextFontCommand{\textbf}{\bfseries\sffamily} -\DeclareTextFontCommand{\textbf}{\fontsize{9}{11}\selectfont\bfseries\sffamily} - - -\RequirePackage{caption} -\DeclareCaptionLabelSeparator{custom}{} -\DeclareCaptionFormat{custom}{{\sffamily\textbf{#1 #2}} #3} -\DeclareCaptionLabelSeparator{pipe}{ $\vert$ }% or $\vert$ -\captionsetup{singlelinecheck=false,format=custom,labelsep=pipe,font=small} -\captionsetup[sub]{singlelinecheck=true,format=custom,labelsep=pipe,font=small} - -% %%======== Header and Footer Content ======== - - - -% % HF custom fonts %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -% % http://c.caignaert.free.fr/Install-ttf-Font.pdf - -% Set the main font to Times New Roman -% \setmainfont{Manrope} -% Set sans-serif font to Manrope -\RequirePackage{ifxetex} -\usepackage{ifxetex} -\ifxetex - \usepackage{fontspec} - \setsansfont{Manrope} -\else - \RequirePackage[T1]{fontenc} - \usepackage[T1]{fontenc} - \usepackage{hfstyle/manrope} - \renewcommand{\sfdefault}{manrope} -\fi - -% % \pdfmapline{+optimistic < assets/Optimistic.ttf s * [0.88] ai2style/fonts/Manrope}{} -% \pdfmapline{+manrope < ai2style/fonts/Manrope.ttf manroperegular } {} -\DeclareFontShape{T1}{manrope}{b} {n} {<-> manropebold } {} - -\DeclareFontShape{T1}{manrope}{m} {it}{<-> ssub * manrope/m/n} {} -\DeclareFontShape{T1}{manrope}{b} {it}{<-> ssub * manrope/b/n} {} - -\DeclareFontShape{T1}{manrope}{m} {sc}{<-> ssub * manrope/m/n} {} -\DeclareFontShape{T1}{manrope}{b} {sc}{<-> ssub * manrope/b/n} {} - -\DeclareFontShape{T1}{manrope}{m} {sl}{<-> ssub * manrope/m/n} {} -\DeclareFontShape{T1}{manrope}{b} {sl}{<-> ssub * manrope/b/n} {} -\end{filecontents} - -% Write the map lines -\pdfmapline{+manroperegular < hfstyle/manrope/Manrope-Regular.ttf } - { s nameptr "{ff~}{vv~}{ll}{, jj}" format.name$ 't := - nameptr #1 > - { namesleft #1 > - { ", " * t * } - { numnames #2 > - { "," * } - 'skip$ - if$ - t "others" = - { " et~al." * } - { " and " * t * } - if$ - } - if$ - } - 't - if$ - nameptr #1 + 'nameptr := - namesleft #1 - 'namesleft := - } - while$ -} - -FUNCTION {format.key} -{ empty$ - { key field.or.null } - { "" } - if$ -} - -FUNCTION {format.authors} -{ author empty$ - { "" } - { author format.names } - if$ -} - -FUNCTION {format.editors} -{ editor empty$ - { "" } - { editor format.names - editor num.names$ #1 > - { ", editors" * } - { ", editor" * } - if$ - } - if$ -} - -FUNCTION {format.isbn} -{ isbn empty$ - { "" } - { new.block "ISBN " isbn * } - if$ -} - -FUNCTION {format.issn} -{ issn empty$ - { "" } - { new.block "ISSN " issn * } - if$ -} - -FUNCTION {format.url} -{ url empty$ - { "" } - { new.block "\url{" url * "}" * } - if$ -} - -FUNCTION {format.doi} -{ doi empty$ - { "" } - { new.block "\doi{" doi * "}" * } - if$ -} - -FUNCTION {format.title} -{ title empty$ - { "" } - { title "t" change.case$ } - if$ -} - -FUNCTION {format.full.names} -{'s := - #1 'nameptr := - s num.names$ 'numnames := - numnames 'namesleft := - { namesleft #0 > } - { s nameptr - "{vv~}{ll}" format.name$ 't := - nameptr #1 > - { - namesleft #1 > - { ", " * t * } - { - numnames #2 > - { "," * } - 'skip$ - if$ - t "others" = - { " et~al." * } - { " and " * t * } - if$ - } - if$ - } - 't - if$ - nameptr #1 + 'nameptr := - namesleft #1 - 'namesleft := - } - while$ -} - -FUNCTION {author.editor.full} -{ author empty$ - { editor empty$ - { "" } - { editor format.full.names } - if$ - } - { author format.full.names } - if$ -} - -FUNCTION {author.full} -{ author empty$ - { "" } - { author format.full.names } - if$ -} - -FUNCTION {editor.full} -{ editor empty$ - { "" } - { editor format.full.names } - if$ -} - -FUNCTION {make.full.names} -{ type$ "book" = - type$ "inbook" = - or - 'author.editor.full - { type$ "proceedings" = - 'editor.full - 'author.full - if$ - } - if$ -} - -FUNCTION {output.bibitem} -{ newline$ - "\bibitem[" write$ - label write$ - ")" make.full.names duplicate$ short.list = - { pop$ } - { * } - if$ - "]{" * write$ - cite$ write$ - "}" write$ - newline$ - "" - before.all 'output.state := -} - -FUNCTION {n.dashify} -{ 't := - "" - { t empty$ not } - { t #1 #1 substring$ "-" = - { t #1 #2 substring$ "--" = not - { "--" * - t #2 global.max$ substring$ 't := - } - { { t #1 #1 substring$ "-" = } - { "-" * - t #2 global.max$ substring$ 't := - } - while$ - } - if$ - } - { t #1 #1 substring$ * - t #2 global.max$ substring$ 't := - } - if$ - } - while$ -} - -FUNCTION {format.date} -{ year duplicate$ empty$ - { "empty year in " cite$ * warning$ - pop$ "" } - 'skip$ - if$ - month empty$ - 'skip$ - { month - " " * swap$ * - } - if$ - extra.label * -} - -FUNCTION {format.btitle} -{ title emphasize -} - -FUNCTION {tie.or.space.connect} -{ duplicate$ text.length$ #3 < - { "~" } - { " " } - if$ - swap$ * * -} - -FUNCTION {either.or.check} -{ empty$ - 'pop$ - { "can't use both " swap$ * " fields in " * cite$ * warning$ } - if$ -} - -FUNCTION {format.bvolume} -{ volume empty$ - { "" } - { "volume" volume tie.or.space.connect - series empty$ - 'skip$ - { " of " * series emphasize * } - if$ - "volume and number" number either.or.check - } - if$ -} - -FUNCTION {format.number.series} -{ volume empty$ - { number empty$ - { series field.or.null } - { output.state mid.sentence = - { "number" } - { "Number" } - if$ - number tie.or.space.connect - series empty$ - { "there's a number but no series in " cite$ * warning$ } - { " in " * series * } - if$ - } - if$ - } - { "" } - if$ -} - -FUNCTION {format.edition} -{ edition empty$ - { "" } - { output.state mid.sentence = - { edition "l" change.case$ " edition" * } - { edition "t" change.case$ " edition" * } - if$ - } - if$ -} - -INTEGERS { multiresult } - -FUNCTION {multi.page.check} -{ 't := - #0 'multiresult := - { multiresult not - t empty$ not - and - } - { t #1 #1 substring$ - duplicate$ "-" = - swap$ duplicate$ "," = - swap$ "+" = - or or - { #1 'multiresult := } - { t #2 global.max$ substring$ 't := } - if$ - } - while$ - multiresult -} - -FUNCTION {format.pages} -{ pages empty$ - { "" } - { pages multi.page.check - { "pages" pages n.dashify tie.or.space.connect } - { "page" pages tie.or.space.connect } - if$ - } - if$ -} - -FUNCTION {format.eid} -{ eid empty$ - { "" } - { "art." eid tie.or.space.connect } - if$ -} - -FUNCTION {format.vol.num.pages} -{ volume field.or.null - number empty$ - 'skip$ - { "\penalty0 (" number * ")" * * - volume empty$ - { "there's a number but no volume in " cite$ * warning$ } - 'skip$ - if$ - } - if$ - pages empty$ - 'skip$ - { duplicate$ empty$ - { pop$ format.pages } - { ":\penalty0 " * pages n.dashify * } - if$ - } - if$ -} - -FUNCTION {format.vol.num.eid} -{ volume field.or.null - number empty$ - 'skip$ - { "\penalty0 (" number * ")" * * - volume empty$ - { "there's a number but no volume in " cite$ * warning$ } - 'skip$ - if$ - } - if$ - eid empty$ - 'skip$ - { duplicate$ empty$ - { pop$ format.eid } - { ":\penalty0 " * eid * } - if$ - } - if$ -} - -FUNCTION {format.chapter.pages} -{ chapter empty$ - 'format.pages - { type empty$ - { "chapter" } - { type "l" change.case$ } - if$ - chapter tie.or.space.connect - pages empty$ - 'skip$ - { ", " * format.pages * } - if$ - } - if$ -} - -FUNCTION {format.in.ed.booktitle} -{ booktitle empty$ - { "" } - { editor empty$ - { "In " booktitle emphasize * } - { "In " format.editors * ", " * booktitle emphasize * } - if$ - } - if$ -} - -FUNCTION {empty.misc.check} -{ author empty$ title empty$ howpublished empty$ - month empty$ year empty$ note empty$ - and and and and and - key empty$ not and - { "all relevant fields are empty in " cite$ * warning$ } - 'skip$ - if$ -} - -FUNCTION {format.thesis.type} -{ type empty$ - 'skip$ - { pop$ - type "t" change.case$ - } - if$ -} - -FUNCTION {format.tr.number} -{ type empty$ - { "Technical Report" } - 'type - if$ - number empty$ - { "t" change.case$ } - { number tie.or.space.connect } - if$ -} - -FUNCTION {format.article.crossref} -{ key empty$ - { journal empty$ - { "need key or journal for " cite$ * " to crossref " * crossref * - warning$ - "" - } - { "In \emph{" journal * "}" * } - if$ - } - { "In " } - if$ - " \citet{" * crossref * "}" * -} - -FUNCTION {format.book.crossref} -{ volume empty$ - { "empty volume in " cite$ * "'s crossref of " * crossref * warning$ - "In " - } - { "Volume" volume tie.or.space.connect - " of " * - } - if$ - editor empty$ - editor field.or.null author field.or.null = - or - { key empty$ - { series empty$ - { "need editor, key, or series for " cite$ * " to crossref " * - crossref * warning$ - "" * - } - { "\emph{" * series * "}" * } - if$ - } - 'skip$ - if$ - } - 'skip$ - if$ - " \citet{" * crossref * "}" * -} - -FUNCTION {format.incoll.inproc.crossref} -{ editor empty$ - editor field.or.null author field.or.null = - or - { key empty$ - { booktitle empty$ - { "need editor, key, or booktitle for " cite$ * " to crossref " * - crossref * warning$ - "" - } - { "In \emph{" booktitle * "}" * } - if$ - } - { "In " } - if$ - } - { "In " } - if$ - " \citet{" * crossref * "}" * -} - -FUNCTION {article} -{ output.bibitem - format.authors "author" output.check - author format.key output - new.block - format.title "title" output.check - new.block - crossref missing$ - { journal emphasize "journal" output.check - eid empty$ - { format.vol.num.pages output } - { format.vol.num.eid output } - if$ - format.date "year" output.check - } - { format.article.crossref output.nonnull - eid empty$ - { format.pages output } - { format.eid output } - if$ - } - if$ - format.issn output - format.doi output - format.url output - new.block - note output - fin.entry -} - -FUNCTION {book} -{ output.bibitem - author empty$ - { format.editors "author and editor" output.check - editor format.key output - } - { format.authors output.nonnull - crossref missing$ - { "author and editor" editor either.or.check } - 'skip$ - if$ - } - if$ - new.block - format.btitle "title" output.check - crossref missing$ - { format.bvolume output - new.block - format.number.series output - new.sentence - publisher "publisher" output.check - address output - } - { new.block - format.book.crossref output.nonnull - } - if$ - format.edition output - format.date "year" output.check - format.isbn output - format.doi output - format.url output - new.block - note output - fin.entry -} - -FUNCTION {booklet} -{ output.bibitem - format.authors output - author format.key output - new.block - format.title "title" output.check - howpublished address new.block.checkb - howpublished output - address output - format.date output - format.isbn output - format.doi output - format.url output - new.block - note output - fin.entry -} - -FUNCTION {inbook} -{ output.bibitem - author empty$ - { format.editors "author and editor" output.check - editor format.key output - } - { format.authors output.nonnull - crossref missing$ - { "author and editor" editor either.or.check } - 'skip$ - if$ - } - if$ - new.block - format.btitle "title" output.check - crossref missing$ - { format.bvolume output - format.chapter.pages "chapter and pages" output.check - new.block - format.number.series output - new.sentence - publisher "publisher" output.check - address output - } - { format.chapter.pages "chapter and pages" output.check - new.block - format.book.crossref output.nonnull - } - if$ - format.edition output - format.date "year" output.check - format.isbn output - format.doi output - format.url output - new.block - note output - fin.entry -} - -FUNCTION {incollection} -{ output.bibitem - format.authors "author" output.check - author format.key output - new.block - format.title "title" output.check - new.block - crossref missing$ - { format.in.ed.booktitle "booktitle" output.check - format.bvolume output - format.number.series output - format.chapter.pages output - new.sentence - publisher "publisher" output.check - address output - format.edition output - format.date "year" output.check - } - { format.incoll.inproc.crossref output.nonnull - format.chapter.pages output - } - if$ - format.isbn output - format.doi output - format.url output - new.block - note output - fin.entry -} - -FUNCTION {inproceedings} -{ output.bibitem - format.authors "author" output.check - author format.key output - new.block - format.title "title" output.check - new.block - crossref missing$ - { format.in.ed.booktitle "booktitle" output.check - format.bvolume output - format.number.series output - format.pages output - address empty$ - { organization publisher new.sentence.checkb - organization output - publisher output - format.date "year" output.check - } - { address output.nonnull - format.date "year" output.check - new.sentence - organization output - publisher output - } - if$ - } - { format.incoll.inproc.crossref output.nonnull - format.pages output - } - if$ - format.isbn output - format.doi output - format.url output - new.block - note output - fin.entry -} - -FUNCTION {conference} { inproceedings } - -FUNCTION {manual} -{ output.bibitem - format.authors output - author format.key output - new.block - format.btitle "title" output.check - organization address new.block.checkb - organization output - address output - format.edition output - format.date output - format.url output - new.block - note output - fin.entry -} - -FUNCTION {mastersthesis} -{ output.bibitem - format.authors "author" output.check - author format.key output - new.block - format.title "title" output.check - new.block - "Master's thesis" format.thesis.type output.nonnull - school "school" output.check - address output - format.date "year" output.check - format.url output - new.block - note output - fin.entry -} - -FUNCTION {misc} -{ output.bibitem - format.authors output - author format.key output - title howpublished new.block.checkb - format.title output - howpublished new.block.checka - howpublished output - format.date output - format.issn output - format.url output - new.block - note output - fin.entry - empty.misc.check -} - -FUNCTION {phdthesis} -{ output.bibitem - format.authors "author" output.check - author format.key output - new.block - format.btitle "title" output.check - new.block - "PhD thesis" format.thesis.type output.nonnull - school "school" output.check - address output - format.date "year" output.check - format.url output - new.block - note output - fin.entry -} - -FUNCTION {proceedings} -{ output.bibitem - format.editors output - editor format.key output - new.block - format.btitle "title" output.check - format.bvolume output - format.number.series output - address output - format.date "year" output.check - new.sentence - organization output - publisher output - format.isbn output - format.doi output - format.url output - new.block - note output - fin.entry -} - -FUNCTION {techreport} -{ output.bibitem - format.authors "author" output.check - author format.key output - new.block - format.title "title" output.check - new.block - format.tr.number output.nonnull - institution "institution" output.check - address output - format.date "year" output.check - format.url output - new.block - note output - fin.entry -} - -FUNCTION {unpublished} -{ output.bibitem - format.authors "author" output.check - author format.key output - new.block - format.title "title" output.check - new.block - note "note" output.check - format.date output - format.url output - fin.entry -} - -FUNCTION {default.type} { misc } - - -MACRO {jan} {"January"} - -MACRO {feb} {"February"} - -MACRO {mar} {"March"} - -MACRO {apr} {"April"} - -MACRO {may} {"May"} - -MACRO {jun} {"June"} - -MACRO {jul} {"July"} - -MACRO {aug} {"August"} - -MACRO {sep} {"September"} - -MACRO {oct} {"October"} - -MACRO {nov} {"November"} - -MACRO {dec} {"December"} - - - -MACRO {acmcs} {"ACM Computing Surveys"} - -MACRO {acta} {"Acta Informatica"} - -MACRO {cacm} {"Communications of the ACM"} - -MACRO {ibmjrd} {"IBM Journal of Research and Development"} - -MACRO {ibmsj} {"IBM Systems Journal"} - -MACRO {ieeese} {"IEEE Transactions on Software Engineering"} - -MACRO {ieeetc} {"IEEE Transactions on Computers"} - -MACRO {ieeetcad} - {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"} - -MACRO {ipl} {"Information Processing Letters"} - -MACRO {jacm} {"Journal of the ACM"} - -MACRO {jcss} {"Journal of Computer and System Sciences"} - -MACRO {scp} {"Science of Computer Programming"} - -MACRO {sicomp} {"SIAM Journal on Computing"} - -MACRO {tocs} {"ACM Transactions on Computer Systems"} - -MACRO {tods} {"ACM Transactions on Database Systems"} - -MACRO {tog} {"ACM Transactions on Graphics"} - -MACRO {toms} {"ACM Transactions on Mathematical Software"} - -MACRO {toois} {"ACM Transactions on Office Information Systems"} - -MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"} - -MACRO {tcs} {"Theoretical Computer Science"} - - -READ - -FUNCTION {sortify} -{ purify$ - "l" change.case$ -} - -INTEGERS { len } - -FUNCTION {chop.word} -{ 's := - 'len := - s #1 len substring$ = - { s len #1 + global.max$ substring$ } - 's - if$ -} - -FUNCTION {format.lab.names} -{ 's := - s #1 "{vv~}{ll}" format.name$ - s num.names$ duplicate$ - #2 > - { pop$ " et~al." * } - { #2 < - 'skip$ - { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" = - { " et~al." * } - { " and " * s #2 "{vv~}{ll}" format.name$ * } - if$ - } - if$ - } - if$ -} - -FUNCTION {author.key.label} -{ author empty$ - { key empty$ - { cite$ #1 #3 substring$ } - 'key - if$ - } - { author format.lab.names } - if$ -} - -FUNCTION {author.editor.key.label} -{ author empty$ - { editor empty$ - { key empty$ - { cite$ #1 #3 substring$ } - 'key - if$ - } - { editor format.lab.names } - if$ - } - { author format.lab.names } - if$ -} - -FUNCTION {author.key.organization.label} -{ author empty$ - { key empty$ - { organization empty$ - { cite$ #1 #3 substring$ } - { "The " #4 organization chop.word #3 text.prefix$ } - if$ - } - 'key - if$ - } - { author format.lab.names } - if$ -} - -FUNCTION {editor.key.organization.label} -{ editor empty$ - { key empty$ - { organization empty$ - { cite$ #1 #3 substring$ } - { "The " #4 organization chop.word #3 text.prefix$ } - if$ - } - 'key - if$ - } - { editor format.lab.names } - if$ -} - -FUNCTION {calc.short.authors} -{ type$ "book" = - type$ "inbook" = - or - 'author.editor.key.label - { type$ "proceedings" = - 'editor.key.organization.label - { type$ "manual" = - 'author.key.organization.label - 'author.key.label - if$ - } - if$ - } - if$ - 'short.list := -} - -FUNCTION {calc.label} -{ calc.short.authors - short.list - "(" - * - year duplicate$ empty$ - short.list key field.or.null = or - { pop$ "" } - 'skip$ - if$ - * - 'label := -} - -FUNCTION {sort.format.names} -{ 's := - #1 'nameptr := - "" - s num.names$ 'numnames := - numnames 'namesleft := - { namesleft #0 > } - { - s nameptr "{vv{ } }{ll{ }}{ ff{ }}{ jj{ }}" format.name$ 't := - nameptr #1 > - { - " " * - namesleft #1 = t "others" = and - { "zzzzz" * } - { numnames #2 > nameptr #2 = and - { "zz" * year field.or.null * " " * } - 'skip$ - if$ - t sortify * - } - if$ - } - { t sortify * } - if$ - nameptr #1 + 'nameptr := - namesleft #1 - 'namesleft := - } - while$ -} - -FUNCTION {sort.format.title} -{ 't := - "A " #2 - "An " #3 - "The " #4 t chop.word - chop.word - chop.word - sortify - #1 global.max$ substring$ -} - -FUNCTION {author.sort} -{ author empty$ - { key empty$ - { "to sort, need author or key in " cite$ * warning$ - "" - } - { key sortify } - if$ - } - { author sort.format.names } - if$ -} - -FUNCTION {author.editor.sort} -{ author empty$ - { editor empty$ - { key empty$ - { "to sort, need author, editor, or key in " cite$ * warning$ - "" - } - { key sortify } - if$ - } - { editor sort.format.names } - if$ - } - { author sort.format.names } - if$ -} - -FUNCTION {author.organization.sort} -{ author empty$ - { organization empty$ - { key empty$ - { "to sort, need author, organization, or key in " cite$ * warning$ - "" - } - { key sortify } - if$ - } - { "The " #4 organization chop.word sortify } - if$ - } - { author sort.format.names } - if$ -} - -FUNCTION {editor.organization.sort} -{ editor empty$ - { organization empty$ - { key empty$ - { "to sort, need editor, organization, or key in " cite$ * warning$ - "" - } - { key sortify } - if$ - } - { "The " #4 organization chop.word sortify } - if$ - } - { editor sort.format.names } - if$ -} - - -FUNCTION {presort} -{ calc.label - label sortify - " " - * - type$ "book" = - type$ "inbook" = - or - 'author.editor.sort - { type$ "proceedings" = - 'editor.organization.sort - { type$ "manual" = - 'author.organization.sort - 'author.sort - if$ - } - if$ - } - if$ - " " - * - year field.or.null sortify - * - " " - * - cite$ - * - #1 entry.max$ substring$ - 'sort.label := - sort.label * - #1 entry.max$ substring$ - 'sort.key$ := -} - -ITERATE {presort} - -SORT - -STRINGS { longest.label last.label next.extra } - -INTEGERS { longest.label.width last.extra.num number.label } - -FUNCTION {initialize.longest.label} -{ "" 'longest.label := - #0 int.to.chr$ 'last.label := - "" 'next.extra := - #0 'longest.label.width := - #0 'last.extra.num := - #0 'number.label := -} - -FUNCTION {forward.pass} -{ last.label label = - { last.extra.num #1 + 'last.extra.num := - last.extra.num int.to.chr$ 'extra.label := - } - { "a" chr.to.int$ 'last.extra.num := - "" 'extra.label := - label 'last.label := - } - if$ - number.label #1 + 'number.label := -} - -FUNCTION {reverse.pass} -{ next.extra "b" = - { "a" 'extra.label := } - 'skip$ - if$ - extra.label 'next.extra := - extra.label - duplicate$ empty$ - 'skip$ - { "{\natexlab{" swap$ * "}}" * } - if$ - 'extra.label := - label extra.label * 'label := -} - -EXECUTE {initialize.longest.label} - -ITERATE {forward.pass} - -REVERSE {reverse.pass} - -FUNCTION {bib.sort.order} -{ sort.label 'sort.key$ := -} - -ITERATE {bib.sort.order} - -SORT - -FUNCTION {begin.bib} -{ preamble$ empty$ - 'skip$ - { preamble$ write$ newline$ } - if$ - "\begin{thebibliography}{" number.label int.to.str$ * "}" * - write$ newline$ - "\providecommand{\natexlab}[1]{#1}" - write$ newline$ - "\providecommand{\url}[1]{\texttt{#1}}" - write$ newline$ - "\expandafter\ifx\csname urlstyle\endcsname\relax" - write$ newline$ - " \providecommand{\doi}[1]{doi: #1}\else" - write$ newline$ - " \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi" - write$ newline$ -} - -EXECUTE {begin.bib} - -EXECUTE {init.state.consts} - -ITERATE {call.type$} - -FUNCTION {end.bib} -{ newline$ - "\end{thebibliography}" write$ newline$ -} - -EXECUTE {end.bib} \ No newline at end of file diff --git a/app/scripts/latex-to-mdx/input/hfstyle/template_content.tex b/app/scripts/latex-to-mdx/input/hfstyle/template_content.tex deleted file mode 100644 index 9a4228494cc47bb10a198678d72685aa5af98cb9..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/hfstyle/template_content.tex +++ /dev/null @@ -1,100 +0,0 @@ -% A few local macros that are used by the example content. -\newcommand{\expect}[2]{\mathds{E}_{{#1}} \left[ {#2} \right]} -\newcommand{\myvec}[1]{\boldsymbol{#1}} -\newcommand{\myvecsym}[1]{\boldsymbol{#1}} -\newcommand{\vx}{\myvec{x}} -\newcommand{\vy}{\myvec{y}} -\newcommand{\vz}{\myvec{z}} -\newcommand{\vtheta}{\myvecsym{\theta}} - -\section{Introduction} - -\kant[1] -\kant[2] -\kant[3] - -\section{Using Figures} -% -We can add figures in the usual way. Figure \ref{fig:image1}. -\begin{figure}[t] - \centering - \includegraphics[width=\columnwidth]{kurt-cotoaga-1210012-unsplash} - \caption{Image. This image comes from unsplash.com, which is a great website to get - free to use high quality images.} - \label{fig:image1} -\end{figure} - -\section{Latex Environments} -Using paragraph environment. -\paragraph{Opening Paragraph.} Paragraph is a way to have a bolded heading, and that can also -enter into the pdf bookmark structure. - -\section{Equations} -% -We can write equations this way: -\begin{align} -\log p(\vx) & = \log \int p_\theta(\vx,\vz) p(\vz) d\vz \nonumber \\ -& = \log \expect{p(\vz)}{p_\theta(\vx,\vz)} -\label{eq:marginalisation1} -\end{align} -We refer to the previous equation \eqref{eq:marginalisation1}. -Later let's compute the gradient $\nabla_\theta \log p(\vx)$. The commands -\verb|\vz|, \verb|\vx|, \verb|\expect| are locally-defined macros. -The file \texttt{defns.tex} provides a larger set of short macros for -common constructions, but some of them clash with existing packages. -\begin{align} -\log p(\vx) & = \nabla_{\vtheta} \sum_{i=1}^N \log p(y | x(\vtheta)) + \mathcal{R}(x) \nonumber \\ - & + \|\nabla_{\vtheta}\vx(\vtheta)\|^2_2 \\ - & y \in \mathbb{R}; \vx \in \mathbb{R}^D \qquad \text{using \texttt{\textbackslash mathbb}} \\ - & y \in \mathds{R}; \vx \in \mathds{R}^D \qquad \text{using \texttt{\textbackslash mathds}} -\label{eq:marginalisation2} -\end{align} - -\subsection{Tables} -Use \href{https://www.tablesgenerator.com/latex_tables}{\texttt{www.tablesgenerator.com/latex\_tables}} to help make tables. - -\begin{table}[tb] - \centering - \caption{Sizes of datasets. Testing with a much longer caption to see how it looks over - multiple lines. } - \begin{tabular}{lll} - \hline - Dataset & N & D \\ - \hline \hline - MNIST & 60,000 & $32\times32$ \\ - ImageNet & 1m & $64\times64$\\ - \hline - \end{tabular} -\end{table} - -\subsubsection{Using lists} -% -Itemize lists -\begin{itemize} - \item Item 1 - \item Item 2 - \item Item 3 -\end{itemize} - -\noindent Enumerate lists -\begin{enumerate} - \item Item 1 - \item Item 2 - \item Item 3 -\end{enumerate} - -\section{DeepMind Brand Colours} -The brand standard specifies a colour palette that is available using the package \texttt{dm-colors}, which is already included in this template. Colours include: \textcolor{dmblue400}{This} \textcolor{dmyellow500}{text} \textcolor{dmteal400}{is} \textcolor{dmpurple400}{rendered} \textcolor{dmred400}{using} \textcolor{dmorange400}{dmcolors}. - -\section{Including References and Bibliography} -\begin{figure*}[t] - \centering - \includegraphics[width=\columnwidth]{kurt-cotoaga-1210012-unsplash} - \includegraphics[width=\columnwidth]{kurt-cotoaga-1210012-unsplash} - \caption{Image. This image comes from unsplash.com, which is a great website to get - free to use high quality images.} - \label{fig:image2} -\end{figure*} -References can be formatted in two styles with the \texttt{citep} -command \citep{silver2016mastering} and with the \texttt{citet} -command \citet{silver2016mastering}. diff --git a/app/scripts/latex-to-mdx/input/main.bbl b/app/scripts/latex-to-mdx/input/main.bbl deleted file mode 100644 index d3ca7de4fe8d144fa32c845027bc0e29242a24ff..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/main.bbl +++ /dev/null @@ -1,49 +0,0 @@ -\begin{thebibliography}{8} -\providecommand{\natexlab}[1]{#1} -\providecommand{\url}[1]{\texttt{#1}} -\expandafter\ifx\csname urlstyle\endcsname\relax - \providecommand{\doi}[1]{doi: #1}\else - \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi - -\bibitem[Lipman et~al.(2024)Lipman, Havasi, Holderrieth, Shaul, Le, Karrer, Chen, {Lopez-Paz}, {Ben-Hamu}, and Gat]{lipmanFlowMatchingGuide2024} -Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T.~Q. Chen, David {Lopez-Paz}, Heli {Ben-Hamu}, and Itai Gat. -\newblock Flow {{Matching Guide}} and {{Code}}, December 2024. - -\bibitem[Nakkiran et~al.(2024)Nakkiran, Bradley, Zhou, and Advani]{nakkiranStepbyStepDiffusionElementary2024} -Preetum Nakkiran, Arwen Bradley, Hattie Zhou, and Madhu Advani. -\newblock Step-by-{{Step Diffusion}}: {{An Elementary Tutorial}}, June 2024. - -\bibitem[Prince(2023)]{prince2023understanding} -Simon~J.D. Prince. -\newblock \emph{Understanding Deep Learning}. -\newblock The MIT Press, 2023. - -\bibitem[{Shalev-Shwartz} and {Ben-David}(2014)]{shalev-shwartzUnderstandingMachineLearning2014} -Shai {Shalev-Shwartz} and Shai {Ben-David}. -\newblock \emph{Understanding {{Machine Learning}}: {{From Theory}} to {{Algorithms}}}. -\newblock Cambridge University Press, 1 edition, May 2014. -\newblock ISBN 978-1-107-05713-5 978-1-107-29801-9. -\newblock \doi{10.1017/CBO9781107298019}. - -\bibitem[Siciliano and Khatib(2016)]{sicilianoSpringerHandbookRobotics2016} -Bruno Siciliano and Oussama Khatib, editors. -\newblock \emph{Springer {{Handbook}} of {{Robotics}}}. -\newblock Springer {{Handbooks}}. Springer International Publishing, Cham, 2016. -\newblock ISBN 978-3-319-32550-7 978-3-319-32552-1. -\newblock \doi{10.1007/978-3-319-32552-1}. - -\bibitem[Sutton and Barto(2018)]{suttonReinforcementLearningIntroduction2018} -Richard~S. Sutton and Andrew~G. Barto. -\newblock \emph{Reinforcement Learning: An Introduction}. -\newblock Adaptive Computation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts, second edition edition, 2018. -\newblock ISBN 978-0-262-03924-6. - -\bibitem[Tedrake({\natexlab{a}})]{tedrakeRoboticManipulationPerception} -Russ Tedrake. -\newblock Robotic {{Manipulation}}. {{Perception}}, {{Planning}} and {{Control}}., {\natexlab{a}}. - -\bibitem[Tedrake({\natexlab{b}})]{tedrakeUnderactuatedRoboticsAlgorithms} -Russ Tedrake. -\newblock Underactuated {{Robotics}}. {{Algorithms}} for {{Walking}}, {{Running}}, {{Swimming}}, {{Flying}}, and {{Manipulation}}, {\natexlab{b}}. - -\end{thebibliography} diff --git a/app/scripts/latex-to-mdx/input/main.bib b/app/scripts/latex-to-mdx/input/main.bib deleted file mode 100644 index f22b4946c7f0f99c7c703c00a9977447298a042c..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/main.bib +++ /dev/null @@ -1,2246 +0,0 @@ -@misc{agibot-world-contributorsAgiBotWorldColosseo2025, - title = {{{AgiBot World Colosseo}}: {{A Large-scale Manipulation Platform}} for {{Scalable}} and {{Intelligent Embodied Systems}}}, - shorttitle = {{{AgiBot World Colosseo}}}, - author = {{AgiBot-World-Contributors} and Bu, Qingwen and Cai, Jisong and Chen, Li and Cui, Xiuqi and Ding, Yan and Feng, Siyuan and Gao, Shenyuan and He, Xindong and Hu, Xuan and Huang, Xu and Jiang, Shu and Jiang, Yuxin and Jing, Cheng and Li, Hongyang and Li, Jialu and Liu, Chiming and Liu, Yi and Lu, Yuxiang and Luo, Jianlan and Luo, Ping and Mu, Yao and Niu, Yuehan and Pan, Yixuan and Pang, Jiangmiao and Qiao, Yu and Ren, Guanghui and Ruan, Cheng and Shan, Jiaqi and Shen, Yongjian and Shi, Chengshi and Shi, Mingkang and Shi, Modi and Sima, Chonghao and Song, Jianheng and Wang, Huijie and Wang, Wenhao and Wei, Dafeng and Xie, Chengen and Xu, Guo and Yan, Junchi and Yang, Cunbiao and Yang, Lei and Yang, Shukai and Yao, Maoqing and Zeng, Jia and Zhang, Chi and Zhang, Qinglin and Zhao, Bin and Zhao, Chengyue and Zhao, Jiaqi and Zhu, Jianchao}, - year = {2025}, - month = aug, - number = {arXiv:2503.06669}, - eprint = {2503.06669}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2503.06669}, - urldate = {2025-08-27}, - abstract = {We explore how scalable robot data can address real-world challenges for generalized robotic manipulation. Introducing AgiBot World, a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets. Accelerated by a standardized collection pipeline with human-in-the-loop verification, AgiBot World guarantees high-quality and diverse data distribution. It is extensible from grippers to dexterous hands and visuo-tactile sensors for fine-grained skill acquisition. Building on top of data, we introduce Genie Operator-1 (GO-1), a novel generalist policy that leverages latent action representations to maximize data utilization, demonstrating predictable performance scaling with increased data volume. Policies pre-trained on our dataset achieve an average performance improvement of 30\% over those trained on Open X-Embodiment, both in in-domain and out-of-distribution scenarios. GO-1 exhibits exceptional capability in real-world dexterous and long-horizon tasks, achieving over 60\% success rate on complex tasks and outperforming prior RDT approach by 32\%. By open-sourcing the dataset, tools, and models, we aim to democratize access to large-scale, high-quality robot data, advancing the pursuit of scalable and general-purpose intelligence.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/TGP4C7GA/AgiBot-World-Contributors et al. - 2025 - AgiBot World Colosseo A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Sys.pdf;/Users/fracapuano/Zotero/storage/IC7BUHWR/2503.html} -} - -@article{agrawalComputationalSensorimotorLearning, - title = {Computational {{Sensorimotor Learning}}}, - author = {Agrawal, Pulkit}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/KSDX9GA2/Agrawal - Computational Sensorimotor Learning.pdf} -} - -@misc{akkayaSolvingRubiksCube2019, - title = {Solving {{Rubik}}'s {{Cube}} with a {{Robot Hand}}}, - author = {Akkaya, Ilge and Andrychowicz, Marcin and Chociej, Maciek and Litwin, Mateusz and McGrew, Bob and Petron, Arthur and Paino, Alex and Plappert, Matthias and Powell, Glenn and Ribas, Raphael and Schneider, Jonas and Tezak, Nikolas and Tworek, Jerry and Welinder, Peter and Weng, Lilian and Yuan, Qiming and Zaremba, Wojciech and Zhang, Lei}, - year = {2019}, - month = oct, - number = {arXiv:1910.07113}, - eprint = {1910.07113}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1910.07113}, - urldate = {2025-08-26}, - abstract = {We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/5HNZLG9D/OpenAI et al. - 2019 - Solving Rubik's Cube with a Robot Hand.pdf;/Users/fracapuano/Zotero/storage/WSM7BJ4I/1910.html} -} - -@misc{alayracFlamingoVisualLanguage2022, - title = {Flamingo: A {{Visual Language Model}} for {{Few-Shot Learning}}}, - shorttitle = {Flamingo}, - author = {Alayrac, Jean-Baptiste and Donahue, Jeff and Luc, Pauline and Miech, Antoine and Barr, Iain and Hasson, Yana and Lenc, Karel and Mensch, Arthur and Millican, Katie and Reynolds, Malcolm and Ring, Roman and Rutherford, Eliza and Cabi, Serkan and Han, Tengda and Gong, Zhitao and Samangooei, Sina and Monteiro, Marianne and Menick, Jacob and Borgeaud, Sebastian and Brock, Andrew and Nematzadeh, Aida and Sharifzadeh, Sahand and Binkowski, Mikolaj and Barreira, Ricardo and Vinyals, Oriol and Zisserman, Andrew and Simonyan, Karen}, - year = {2022}, - month = nov, - number = {arXiv:2204.14198}, - eprint = {2204.14198}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2204.14198}, - urldate = {2025-08-27}, - abstract = {Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of our models, exploring and measuring their ability to rapidly adapt to a variety of image and video tasks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer; captioning tasks, which evaluate the ability to describe a scene or an event; and close-ended tasks such as multiple-choice visual question-answering. For tasks lying anywhere on this spectrum, a single Flamingo model can achieve a new state of the art with few-shot learning, simply by prompting the model with task-specific examples. On numerous benchmarks, Flamingo outperforms models fine-tuned on thousands of times more task-specific data.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/QZ69HN5K/Alayrac et al. - 2022 - Flamingo a Visual Language Model for Few-Shot Learning.pdf;/Users/fracapuano/Zotero/storage/JMAD5HJY/2204.html} -} - -@article{aldacoALOHA2Enhanced, - title = {{{ALOHA}} 2: {{An Enhanced Low-Cost Hardware}} for {{Bimanual Teleoperation}}}, - author = {Aldaco, Jorge and Armstrong, Travis and Baruch, Robert and Bingham, Jeff and Chan, Sanky and Dwibedi, Debidatta and Finn, Chelsea and Florence, Pete and Goodrich, Spencer and Gramlich, Wayne and Herzog, Alexander and Hoech, Jonathan and Nguyen, Thinh and Storz, Ian and Tabanpour, Baruch and Tompson, Jonathan and Wahid, Ayzaan and Wahrburg, Ted and Xu, Sichun and Yaroshenko, Sergey and Zhao, Tony Z}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/LDEJG62Q/Aldaco et al. - ALOHA 2 An Enhanced Low-Cost Hardware for Bimanual Teleoperation.pdf} -} - -@article{alizadehComprehensiveSurveySpace2024, - title = {A Comprehensive Survey of Space Robotic Manipulators for On-Orbit Servicing}, - author = {Alizadeh, Mohammad and Zhu, Zheng H.}, - year = {2024}, - month = oct, - journal = {Frontiers in Robotics and AI}, - volume = {11}, - publisher = {Frontiers}, - issn = {2296-9144}, - doi = {10.3389/frobt.2024.1470950}, - urldate = {2025-08-26}, - abstract = {On-Orbit Servicing (OOS) robots are transforming space exploration by enabling vital maintenance and repair of spacecraft directly in space. However, achieving precise and safe manipulation in microgravity necessitates overcoming significant challenges. This survey delves into four crucial areas essential for successful OOS manipulation: object state estimation, motion planning, and feedback control. Techniques from traditional vision to advanced X-ray and neural network methods are explored for object state estimation. Strategies for fuel-optimized trajectories, docking maneuvers, and collision avoidance are examined in motion planning. The survey also explores control methods for various scenarios, including cooperative manipulation and handling uncertainties, in feedback control. Additionally, this survey examines how Machine learning techniques can further propel OOS robots towards more complex and delicate tasks in space.}, - langid = {english}, - keywords = {control,machine learning,motion planning,on-orbit servicing,pose estimation,robotic manipulator,space robots}, - file = {/Users/fracapuano/Zotero/storage/VA36KZYY/Alizadeh and Zhu - 2024 - A comprehensive survey of space robotic manipulators for on-orbit servicing.pdf} -} - -@misc{allalSmolLM2WhenSmol2025, - title = {{{SmolLM2}}: {{When Smol Goes Big}} -- {{Data-Centric Training}} of a {{Small Language Model}}}, - shorttitle = {{{SmolLM2}}}, - author = {Allal, Loubna Ben and Lozhkov, Anton and Bakouch, Elie and Bl{\'a}zquez, Gabriel Mart{\'i}n and Penedo, Guilherme and Tunstall, Lewis and Marafioti, Andr{\'e}s and Kydl{\'i}{\v c}ek, Hynek and Lajar{\'i}n, Agust{\'i}n Piqueres and Srivastav, Vaibhav and Lochner, Joshua and Fahlgren, Caleb and Nguyen, Xuan-Son and Fourrier, Cl{\'e}mentine and Burtenshaw, Ben and Larcher, Hugo and Zhao, Haojun and Zakka, Cyril and Morlon, Mathieu and Raffel, Colin and von Werra, Leandro and Wolf, Thomas}, - year = {2025}, - month = feb, - number = {arXiv:2502.02737}, - eprint = {2502.02737}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2502.02737}, - urldate = {2025-09-09}, - abstract = {While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in resource-constrained settings. In this paper, we document the development of SmolLM2, a state-of-the-art "small" (1.7 billion parameter) language model (LM). To attain strong performance, we overtrain SmolLM2 on {\textasciitilde}11 trillion tokens of data using a multi-stage training process that mixes web text with specialized math, code, and instruction-following data. We additionally introduce new specialized datasets (FineMath, Stack-Edu, and SmolTalk) at stages where we found existing datasets to be problematically small or low-quality. To inform our design decisions, we perform both small-scale ablations as well as a manual refinement process that updates the dataset mixing rates at each stage based on the performance at the previous stage. Ultimately, we demonstrate that SmolLM2 outperforms other recent small LMs including Qwen2.5-1.5B and Llama3.2-1B. To facilitate future research on LM development as well as applications of small LMs, we release both SmolLM2 as well as all of the datasets we prepared in the course of this project.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computation and Language}, - file = {/Users/fracapuano/Zotero/storage/I7XDMSV7/Allal et al. - 2025 - SmolLM2 When Smol Goes Big -- Data-Centric Training of a Small Language Model.pdf;/Users/fracapuano/Zotero/storage/6MLZI84T/2502.html} -} - -@misc{antonovaReinforcementLearningPivoting2017, - title = {Reinforcement {{Learning}} for {{Pivoting Task}}}, - author = {Antonova, Rika and Cruciani, Silvia and Smith, Christian and Kragic, Danica}, - year = {2017}, - month = mar, - number = {arXiv:1703.00472}, - eprint = {1703.00472}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1703.00472}, - urldate = {2025-08-25}, - abstract = {In this work we propose an approach to learn a robust policy for solving the pivoting task. Recently, several model-free continuous control algorithms were shown to learn successful policies without prior knowledge of the dynamics of the task. However, obtaining successful policies required thousands to millions of training episodes, limiting the applicability of these approaches to real hardware. We developed a training procedure that allows us to use a simple custom simulator to learn policies robust to the mismatch of simulation vs robot. In our experiments, we demonstrate that the policy learned in the simulator is able to pivot the object to the desired target angle on the real robot. We also show generalization to an object with different inertia, shape, mass and friction properties than those used during training. This result is a step towards making model-free reinforcement learning available for solving robotics tasks via pre-training in simulators that offer only an imprecise match to the real-world dynamics.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/WRZCHVGB/Antonova et al. - 2017 - Reinforcement Learning for Pivoting Task.pdf;/Users/fracapuano/Zotero/storage/WJEJ2VGU/1703.html} -} - -@article{aractingiControllingSolo12Quadruped2023, - title = {Controlling the {{Solo12}} Quadruped Robot with Deep Reinforcement Learning}, - author = {Aractingi, Michel and L{\'e}ziart, Pierre-Alexandre and Flayols, Thomas and Perez, Julien and Silander, Tomi and Sou{\`e}res, Philippe}, - year = {2023}, - month = jul, - journal = {Scientific Reports}, - volume = {13}, - number = {1}, - pages = {11945}, - publisher = {Nature Publishing Group}, - issn = {2045-2322}, - doi = {10.1038/s41598-023-38259-7}, - urldate = {2025-08-27}, - abstract = {Quadruped robots require robust and general locomotion skills to exploit their mobility potential in complex and challenging environments. In this work, we present an implementation of a robust end-to-end learning-based controller on the Solo12 quadruped. Our method is based on deep reinforcement learning of joint impedance references. The resulting control policies follow a commanded velocity reference while being efficient in its energy consumption and easy to deploy. We detail the learning procedure and method for transfer on the real robot. We show elaborate experiments. Finally, we present experimental results of the learned locomotion on various grounds indoors and outdoors. These results show that the Solo12 robot is a suitable open-source platform for research combining learning and control because of the easiness in transferring and deploying learned controllers.}, - copyright = {2023 The Author(s)}, - langid = {english}, - keywords = {Computer science,Information technology}, - file = {/Users/fracapuano/Zotero/storage/84ZFT7RP/Aractingi et al. - 2023 - Controlling the Solo12 quadruped robot with deep reinforcement learning.pdf} -} - -@misc{bai2025qwen25vl, - title = {Qwen2.5-{{VL}} Technical Report}, - author = {Bai, Shuai and Chen, Keqin and Liu, Xuejing and Wang, Jialin and Ge, Wenbin and Song, Sibo and Dang, Kai and Wang, Peng and Wang, Shijie and Tang, Jun and Zhong, Humen and Zhu, Yuanzhi and Yang, Mingkun and Li, Zhaohai and Wan, Jianqiang and Wang, Pengfei and Ding, Wei and Fu, Zheren and Xu, Yiheng and Ye, Jiabo and Zhang, Xi and Xie, Tianbao and Cheng, Zesen and Zhang, Hang and Yang, Zhibo and Xu, Haiyang and Lin, Junyang}, - year = {2025}, - eprint = {2502.13923}, - primaryclass = {cs.CV}, - archiveprefix = {arXiv} -} - -@misc{ballEfficientOnlineReinforcement2023, - title = {Efficient {{Online Reinforcement Learning}} with {{Offline Data}}}, - author = {Ball, Philip J. and Smith, Laura and Kostrikov, Ilya and Levine, Sergey}, - year = {2023}, - month = may, - number = {arXiv:2302.02948}, - eprint = {2302.02948}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2302.02948}, - urldate = {2025-08-30}, - abstract = {Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a \${\textbackslash}mathbf\{2.5{\textbackslash}times\}\$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead. We have released our code at https://github.com/ikostrikov/rlpd.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/MUKA5D2V/Ball et al. - 2023 - Efficient Online Reinforcement Learning with Offline Data.pdf;/Users/fracapuano/Zotero/storage/IKURHC3D/2302.html} -} - -@misc{bekrisStateRobotMotion2024, - title = {The {{State}} of {{Robot Motion Generation}}}, - author = {Bekris, Kostas E. and Doerr, Joe and Meng, Patrick and Tangirala, Sumanth}, - year = {2024}, - month = oct, - number = {arXiv:2410.12172}, - eprint = {2410.12172}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2410.12172}, - urldate = {2025-08-26}, - abstract = {This paper reviews the large spectrum of methods for generating robot motion proposed over the 50 years of robotics research culminating in recent developments. It crosses the boundaries of methodologies, typically not surveyed together, from those that operate over explicit models to those that learn implicit ones. The paper discusses the current state-of-the-art as well as properties of varying methodologies, highlighting opportunities for integration.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/DMJJZFDZ/Bekris et al. - 2024 - The State of Robot Motion Generation.pdf;/Users/fracapuano/Zotero/storage/TL42IRAN/2410.html} -} - -@article{bellemareAutonomousNavigationStratospheric2020, - title = {Autonomous Navigation of Stratospheric Balloons Using Reinforcement Learning}, - author = {Bellemare, Marc G. and Candido, Salvatore and Castro, Pablo Samuel and Gong, Jun and Machado, Marlos C. and Moitra, Subhodeep and Ponda, Sameera S. and Wang, Ziyu}, - year = {2020}, - month = dec, - journal = {Nature}, - volume = {588}, - number = {7836}, - pages = {77--82}, - publisher = {Nature Publishing Group}, - issn = {1476-4687}, - doi = {10.1038/s41586-020-2939-8}, - urldate = {2025-08-31}, - abstract = {Efficiently navigating a superpressure balloon in the stratosphere1 requires the integration of a multitude of cues, such as wind speed and solar elevation, and the process is complicated by forecast errors and sparse wind measurements. Coupled with the need to make decisions in real time, these factors rule out the use of conventional control techniques2,3. Here we describe the use of reinforcement learning4,5 to create a high-performing flight controller. Our algorithm uses data augmentation6,7 and a self-correcting design to overcome the key technical challenge of reinforcement learning from imperfect data, which has proved to be a major obstacle to its application to physical systems8. We deployed our controller to station Loon superpressure balloons at multiple locations across the globe, including a 39-day controlled experiment over the Pacific Ocean. Analyses show that the controller outperforms Loon's previous algorithm and is robust to the natural diversity in stratospheric winds. These results demonstrate that reinforcement learning is an effective solution to real-world autonomous control problems in which neither conventional methods nor human intervention suffice, offering clues about what may be needed to create artificially intelligent agents that continuously interact with real, dynamic environments.}, - copyright = {2020 The Author(s), under exclusive licence to Springer Nature Limited}, - langid = {english}, - keywords = {Aerospace engineering,Computer science} -} - -@article{bellmanMarkovianDecisionProcess1957, - title = {A {{Markovian Decision Process}}}, - author = {Bellman, Richard}, - year = {1957}, - journal = {Journal of Mathematics and Mechanics}, - volume = {6}, - number = {5}, - eprint = {24900506}, - eprinttype = {jstor}, - pages = {679--684}, - publisher = {Indiana University Mathematics Department}, - issn = {0095-9057}, - urldate = {2025-08-30} -} - -@misc{beyerPaliGemmaVersatile3B2024, - title = {{{PaliGemma}}: {{A}} Versatile {{3B VLM}} for Transfer}, - shorttitle = {{{PaliGemma}}}, - author = {Beyer, Lucas and Steiner, Andreas and Pinto, Andr{\'e} Susano and Kolesnikov, Alexander and Wang, Xiao and Salz, Daniel and Neumann, Maxim and Alabdulmohsin, Ibrahim and Tschannen, Michael and Bugliarello, Emanuele and Unterthiner, Thomas and Keysers, Daniel and Koppula, Skanda and Liu, Fangyu and Grycner, Adam and Gritsenko, Alexey and Houlsby, Neil and Kumar, Manoj and Rong, Keran and Eisenschlos, Julian and Kabra, Rishabh and Bauer, Matthias and Bo{\v s}njak, Matko and Chen, Xi and Minderer, Matthias and Voigtlaender, Paul and Bica, Ioana and Balazevic, Ivana and Puigcerver, Joan and Papalampidi, Pinelopi and Henaff, Olivier and Xiong, Xi and Soricut, Radu and Harmsen, Jeremiah and Zhai, Xiaohua}, - year = {2024}, - month = oct, - number = {arXiv:2407.07726}, - eprint = {2407.07726}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2407.07726}, - urldate = {2025-09-08}, - abstract = {PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/IPDYNWC4/Beyer et al. - 2024 - PaliGemma A versatile 3B VLM for transfer.pdf;/Users/fracapuano/Zotero/storage/R7UVD9WC/2407.html} -} - -@misc{bjorckGR00TN1Open2025, - title = {{{GR00T N1}}: {{An Open Foundation Model}} for {{Generalist Humanoid Robots}}}, - shorttitle = {{{GR00T N1}}}, - author = {Bjorck, Johan and Casta{\~n}eda, Fernando and Cherniadev, Nikita and Da, Xingye and Ding, Runyu and Fan, Linxi "Jim" and Fang, Yu and Fox, Dieter and Hu, Fengyuan and Huang, Spencer and Jang, Joel and Jiang, Zhenyu and Kautz, Jan and Kundalia, Kaushil and Lao, Lawrence and Li, Zhiqi and Lin, Zongyu and Lin, Kevin and Liu, Guilin and Llontop, Edith and Magne, Loic and Mandlekar, Ajay and Narayan, Avnish and Nasiriany, Soroush and Reed, Scott and Tan, You Liang and Wang, Guanzhi and Wang, Zu and Wang, Jing and Wang, Qi and Xiang, Jiannan and Xie, Yuqi and Xu, Yinzhen and Xu, Zhenjia and Ye, Seonghyeon and Yu, Zhiding and Zhang, Ao and Zhang, Hao and Zhao, Yizhou and Zheng, Ruijie and Zhu, Yuke}, - year = {2025}, - month = mar, - number = {arXiv:2503.14734}, - eprint = {2503.14734}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2503.14734}, - urldate = {2025-08-26}, - abstract = {General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy in the human world. A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapidly learn new tasks. To this end, we introduce GR00T N1, an open foundation model for humanoid robots. GR00T N1 is a Vision-Language-Action (VLA) model with a dual-system architecture. The vision-language module (System 2) interprets the environment through vision and language instructions. The subsequent diffusion transformer module (System 1) generates fluid motor actions in real time. Both modules are tightly coupled and jointly trained end-to-end. We train GR00T N1 with a heterogeneous mixture of real-robot trajectories, human videos, and synthetically generated datasets. We show that our generalist robot model GR00T N1 outperforms the state-of-the-art imitation learning baselines on standard simulation benchmarks across multiple robot embodiments. Furthermore, we deploy our model on the Fourier GR-1 humanoid robot for language-conditioned bimanual manipulation tasks, achieving strong performance with high data efficiency.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/BDNSKFA6/NVIDIA et al. - 2025 - GR00T N1 An Open Foundation Model for Generalist Humanoid Robots.pdf;/Users/fracapuano/Zotero/storage/FENU9PQR/2503.html} -} - -@misc{black$p_0$VisionLanguageActionFlow2024, - title = {\${$\pi\_$}0\$: {{A Vision-Language-Action Flow Model}} for {{General Robot Control}}}, - shorttitle = {\${$\pi\_$}0\$}, - author = {Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and Jakubczak, Szymon and Jones, Tim and Ke, Liyiming and Levine, Sergey and {Li-Bell}, Adrian and Mothukuri, Mohith and Nair, Suraj and Pertsch, Karl and Shi, Lucy Xiaoyang and Tanner, James and Vuong, Quan and Walling, Anna and Wang, Haohuan and Zhilinsky, Ury}, - year = {2024}, - month = oct, - number = {arXiv:2410.24164}, - eprint = {2410.24164}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2410.24164}, - urldate = {2025-08-28}, - abstract = {Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial intelligence. However, bringing robot learning to the level of generality required for effective real-world systems faces major obstacles in terms of data, generalization, and robustness. In this paper, we discuss how generalist robot policies (i.e., robot foundation models) can address these challenges, and how we can design effective generalist robot policies for complex and highly dexterous tasks. We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We then discuss how this model can be trained on a large and diverse dataset from multiple dexterous robot platforms, including single-arm robots, dual-arm robots, and mobile manipulators. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people and from a high-level VLM policy, and its ability to acquire new skills via fine-tuning. Our results cover a wide variety of tasks, such as laundry folding, table cleaning, and assembling boxes.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/GUEM37NZ/Black et al. - 2024 - $ฯ€_0$ A Vision-Language-Action Flow Model for General Robot Control.pdf;/Users/fracapuano/Zotero/storage/FHYXZWF8/2410.html} -} - -@inproceedings{BLIP-2, - title = {{{BLIP-2}}: Bootstrapping Language-Image Pre-Training with Frozen Image Encoders and Large Language Models}, - booktitle = {Proceedings of the 40th International Conference on Machine Learning}, - author = {Li, Junnan and Li, Dongxu and Savarese, Silvio and Hoi, Steven}, - year = {2023}, - series = {{{ICML}}'23}, - publisher = {JMLR.org}, - address = {, Honolulu, Hawaii, USA,}, - abstract = {The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pretraining strategy that bootstraps vision-language pre-training from off-the-shelf frozen pretrained image encoders and frozen large language models. BLIP-2 bridges the modality gap with a lightweight Querying Transformer, which is pretrained in two stages. The first stage bootstraps vision-language representation learning from a frozen image encoder. The second stage bootstraps vision-to-language generative learning from a frozen language model. BLIP-2 achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods. For example, our model outperforms Flamingo80B by 8.7\% on zero-shot VQAv2 with 54x fewer trainable parameters. We also demonstrate the model's capabilities of zero-shot image-to-text generation that can follow natural language instructions.}, - articleno = {814} -} - -@misc{brohanRT1RoboticsTransformer2023, - title = {{{RT-1}}: {{Robotics Transformer}} for {{Real-World Control}} at {{Scale}}}, - shorttitle = {{{RT-1}}}, - author = {Brohan, Anthony and Brown, Noah and Carbajal, Justice and Chebotar, Yevgen and Dabis, Joseph and Finn, Chelsea and Gopalakrishnan, Keerthana and Hausman, Karol and Herzog, Alex and Hsu, Jasmine and Ibarz, Julian and Ichter, Brian and Irpan, Alex and Jackson, Tomas and Jesmonth, Sally and Joshi, Nikhil J. and Julian, Ryan and Kalashnikov, Dmitry and Kuang, Yuheng and Leal, Isabel and Lee, Kuang-Huei and Levine, Sergey and Lu, Yao and Malla, Utsav and Manjunath, Deeksha and Mordatch, Igor and Nachum, Ofir and Parada, Carolina and Peralta, Jodilyn and Perez, Emily and Pertsch, Karl and Quiambao, Jornell and Rao, Kanishka and Ryoo, Michael and Salazar, Grecia and Sanketi, Pannag and Sayed, Kevin and Singh, Jaspiar and Sontakke, Sumedh and Stone, Austin and Tan, Clayton and Tran, Huong and Vanhoucke, Vincent and Vega, Steve and Vuong, Quan and Xia, Fei and Xiao, Ted and Xu, Peng and Xu, Sichun and Yu, Tianhe and Zitkovich, Brianna}, - year = {2023}, - month = aug, - number = {arXiv:2212.06817}, - eprint = {2212.06817}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2212.06817}, - urldate = {2025-09-07}, - abstract = {By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/TTBN3M5Y/Brohan et al. - 2023 - RT-1 Robotics Transformer for Real-World Control at Scale.pdf;/Users/fracapuano/Zotero/storage/DK3D593W/2212.html} -} - -@misc{brohanRT2VisionLanguageActionModels2023, - title = {{{RT-2}}: {{Vision-Language-Action Models Transfer Web Knowledge}} to {{Robotic Control}}}, - shorttitle = {{{RT-2}}}, - author = {Brohan, Anthony and Brown, Noah and Carbajal, Justice and Chebotar, Yevgen and Chen, Xi and Choromanski, Krzysztof and Ding, Tianli and Driess, Danny and Dubey, Avinava and Finn, Chelsea and Florence, Pete and Fu, Chuyuan and Arenas, Montse Gonzalez and Gopalakrishnan, Keerthana and Han, Kehang and Hausman, Karol and Herzog, Alexander and Hsu, Jasmine and Ichter, Brian and Irpan, Alex and Joshi, Nikhil and Julian, Ryan and Kalashnikov, Dmitry and Kuang, Yuheng and Leal, Isabel and Lee, Lisa and Lee, Tsang-Wei Edward and Levine, Sergey and Lu, Yao and Michalewski, Henryk and Mordatch, Igor and Pertsch, Karl and Rao, Kanishka and Reymann, Krista and Ryoo, Michael and Salazar, Grecia and Sanketi, Pannag and Sermanet, Pierre and Singh, Jaspiar and Singh, Anikait and Soricut, Radu and Tran, Huong and Vanhoucke, Vincent and Vuong, Quan and Wahid, Ayzaan and Welker, Stefan and Wohlhart, Paul and Wu, Jialin and Xia, Fei and Xiao, Ted and Xu, Peng and Xu, Sichun and Yu, Tianhe and Zitkovich, Brianna}, - year = {2023}, - month = jul, - number = {arXiv:2307.15818}, - eprint = {2307.15818}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2307.15818}, - urldate = {2025-09-07}, - abstract = {We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/CZHMNYPG/Brohan et al. - 2023 - RT-2 Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.pdf;/Users/fracapuano/Zotero/storage/WN2E7AZH/2307.html} -} - -@misc{brownLanguageModelsAre2020, - title = {Language {{Models}} Are {{Few-Shot Learners}}}, - author = {Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and {Herbert-Voss}, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel M. and Wu, Jeffrey and Winter, Clemens and Hesse, Christopher and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario}, - year = {2020}, - month = jul, - number = {arXiv:2005.14165}, - eprint = {2005.14165}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2005.14165}, - urldate = {2025-08-28}, - abstract = {Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computation and Language}, - file = {/Users/fracapuano/Zotero/storage/L6J45ZW7/Brown et al. - 2020 - Language Models are Few-Shot Learners.pdf;/Users/fracapuano/Zotero/storage/52DC5AT2/2005.html} -} - -@article{burridgeSequentialCompositionDynamically1999b, - title = {Sequential {{Composition}} of {{Dynamically Dexterous Robot Behaviors}}}, - author = {Burridge, R. R. and Rizzi, A. A. and Koditschek, D. E.}, - year = {1999}, - month = jun, - journal = {The International Journal of Robotics Research}, - volume = {18}, - number = {6}, - pages = {534--555}, - issn = {0278-3649, 1741-3176}, - doi = {10.1177/02783649922066385}, - urldate = {2025-08-26}, - abstract = {We report on our efforts to develop a sequential robot controllercomposition technique in the context of dexterous ``batting'' maneuvers. A robot with a flat paddle is required to strike repeatedly at a thrown ball until the ball is brought to rest on the paddle at a specified location. The robot's reachable workspace is blocked by an obstacle that disconnects the free space formed when the ball and paddle remain in contact, forcing the machine to ``let go'' for a time to bring the ball to the desired state. The controller compositions we create guarantee that a ball introduced in the ``safe workspace'' remains there and is ultimately brought to the goal. We report on experimental results from an implementation of these formal composition methods, and present descriptive statistics characterizing the experiments.}, - copyright = {https://journals.sagepub.com/page/policies/text-and-data-mining-license}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/TFZQ6EHJ/Burridge et al. - 1999 - Sequential Composition of Dynamically Dexterous Robot Behaviors.pdf} -} - -@misc{cadene2024lerobot, - title = {{{LeRobot}}: {{State-of-the-art}} Machine Learning for Real-World Robotics in Pytorch}, - author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas}, - year = {2024} -} - -@misc{cadeneLeRobotStateoftheartMachine, - title = {{{LeRobot}}: {{State-of-the-art Machine Learning}} for {{Real-World Robotics}} in {{Pytorch}}}, - author = {Cadene, Remi} -} - -@misc{cadeneLeRobotStateoftheartMachine2024, - title = {{{LeRobot}}: {{State-of-the-art Machine Learning}} for {{Real-World Robotics}} in {{Pytorch}}}, - author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Galloudec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Chogari, Jade and Moss, Jess and Wolf, Thomas}, - year = {2024} -} - -@misc{caronEmergingPropertiesSelfSupervised2021, - title = {Emerging {{Properties}} in {{Self-Supervised Vision Transformers}}}, - author = {Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J{\'e}gou, Herv{\'e} and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand}, - year = {2021}, - month = may, - number = {arXiv:2104.14294}, - eprint = {2104.14294}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2104.14294}, - urldate = {2025-09-07}, - abstract = {In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3\% top-1 on ImageNet with a small ViT. Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels. We show the synergy between DINO and ViTs by achieving 80.1\% top-1 on ImageNet in linear evaluation with ViT-Base.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/AYIY6DTF/Caron et al. - 2021 - Emerging Properties in Self-Supervised Vision Transformers.pdf;/Users/fracapuano/Zotero/storage/EKA7ZN2P/2104.html} -} - -@inproceedings{chebotar2019closing, - title = {Closing the Sim-to-Real Loop: {{Adapting}} Simulation Randomization with Real World Experience}, - booktitle = {2019 International Conference on Robotics and Automation ({{ICRA}})}, - author = {Chebotar, Yevgen and Handa, Ankur and Makoviychuk, Viktor and Macklin, Miles and Issac, Jan and Ratliff, Nathan and Fox, Dieter}, - year = {2019}, - pages = {8973--8979}, - publisher = {IEEE} -} - -@inproceedings{chebotarClosingSimtorealLoop2019, - title = {Closing the Sim-to-Real Loop: {{Adapting}} Simulation Randomization with Real World Experience}, - shorttitle = {Closing the Sim-to-Real Loop}, - booktitle = {2019 {{International Conference}} on {{Robotics}} and {{Automation}} ({{ICRA}})}, - author = {Chebotar, Yevgen and Handa, Ankur and Makoviychuk, Viktor and Macklin, Miles and Issac, Jan and Ratliff, Nathan and Fox, Dieter}, - year = {2019}, - pages = {8973--8979}, - publisher = {IEEE}, - urldate = {2025-08-31} -} - -@misc{chenPaLIXScalingMultilingual2023, - title = {{{PaLI-X}}: {{On Scaling}} up a {{Multilingual Vision}} and {{Language Model}}}, - shorttitle = {{{PaLI-X}}}, - author = {Chen, Xi and Djolonga, Josip and Padlewski, Piotr and Mustafa, Basil and Changpinyo, Soravit and Wu, Jialin and Ruiz, Carlos Riquelme and Goodman, Sebastian and Wang, Xiao and Tay, Yi and Shakeri, Siamak and Dehghani, Mostafa and Salz, Daniel and Lucic, Mario and Tschannen, Michael and Nagrani, Arsha and Hu, Hexiang and Joshi, Mandar and Pang, Bo and Montgomery, Ceslee and Pietrzyk, Paulina and Ritter, Marvin and Piergiovanni, A. J. and Minderer, Matthias and Pavetic, Filip and Waters, Austin and Li, Gang and Alabdulmohsin, Ibrahim and Beyer, Lucas and Amelot, Julien and Lee, Kenton and Steiner, Andreas Peter and Li, Yang and Keysers, Daniel and Arnab, Anurag and Xu, Yuanzhong and Rong, Keran and Kolesnikov, Alexander and Seyedhosseini, Mojtaba and Angelova, Anelia and Zhai, Xiaohua and Houlsby, Neil and Soricut, Radu}, - year = {2023}, - month = may, - number = {arXiv:2305.18565}, - eprint = {2305.18565}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2305.18565}, - urldate = {2025-09-07}, - abstract = {We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. PaLI-X advances the state-of-the-art on most vision-and-language benchmarks considered (25+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/UES2DMFM/Chen et al. - 2023 - PaLI-X On Scaling up a Multilingual Vision and Language Model.pdf;/Users/fracapuano/Zotero/storage/LEGNNSHS/2305.html} -} - -@misc{chiDiffusionPolicyVisuomotor2024, - title = {Diffusion {{Policy}}: {{Visuomotor Policy Learning}} via {{Action Diffusion}}}, - shorttitle = {Diffusion {{Policy}}}, - author = {Chi, Cheng and Xu, Zhenjia and Feng, Siyuan and Cousineau, Eric and Du, Yilun and Burchfiel, Benjamin and Tedrake, Russ and Song, Shuran}, - year = {2024}, - month = mar, - number = {arXiv:2303.04137}, - eprint = {2303.04137}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2303.04137}, - urldate = {2025-08-28}, - abstract = {This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9\%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details is publicly available diffusion-policy.cs.columbia.edu}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/7XRY3GJX/Chi et al. - 2024 - Diffusion Policy Visuomotor Policy Learning via Action Diffusion.pdf;/Users/fracapuano/Zotero/storage/BBBPKKMZ/2303.html} -} - -@misc{collaborationOpenXEmbodimentRobotic2025, - title = {Open {{X-Embodiment}}: {{Robotic Learning Datasets}} and {{RT-X Models}}}, - shorttitle = {Open {{X-Embodiment}}}, - author = {Collaboration, Open X.-Embodiment and O'Neill, Abby and Rehman, Abdul and Gupta, Abhinav and Maddukuri, Abhiram and Gupta, Abhishek and Padalkar, Abhishek and Lee, Abraham and Pooley, Acorn and Gupta, Agrim and Mandlekar, Ajay and Jain, Ajinkya and Tung, Albert and Bewley, Alex and Herzog, Alex and Irpan, Alex and Khazatsky, Alexander and Rai, Anant and Gupta, Anchit and Wang, Andrew and Kolobov, Andrey and Singh, Anikait and Garg, Animesh and Kembhavi, Aniruddha and Xie, Annie and Brohan, Anthony and Raffin, Antonin and Sharma, Archit and Yavary, Arefeh and Jain, Arhan and Balakrishna, Ashwin and Wahid, Ayzaan and {Burgess-Limerick}, Ben and Kim, Beomjoon and Sch{\"o}lkopf, Bernhard and Wulfe, Blake and Ichter, Brian and Lu, Cewu and Xu, Charles and Le, Charlotte and Finn, Chelsea and Wang, Chen and Xu, Chenfeng and Chi, Cheng and Huang, Chenguang and Chan, Christine and Agia, Christopher and Pan, Chuer and Fu, Chuyuan and Devin, Coline and Xu, Danfei and Morton, Daniel and Driess, Danny and Chen, Daphne and Pathak, Deepak and Shah, Dhruv and B{\"u}chler, Dieter and Jayaraman, Dinesh and Kalashnikov, Dmitry and Sadigh, Dorsa and Johns, Edward and Foster, Ethan and Liu, Fangchen and Ceola, Federico and Xia, Fei and Zhao, Feiyu and Frujeri, Felipe Vieira and Stulp, Freek and Zhou, Gaoyue and Sukhatme, Gaurav S. and Salhotra, Gautam and Yan, Ge and Feng, Gilbert and Schiavi, Giulio and Berseth, Glen and Kahn, Gregory and Yang, Guangwen and Wang, Guanzhi and Su, Hao and Fang, Hao-Shu and Shi, Haochen and Bao, Henghui and Amor, Heni Ben and Christensen, Henrik I. and Furuta, Hiroki and Bharadhwaj, Homanga and Walke, Homer and Fang, Hongjie and Ha, Huy and Mordatch, Igor and Radosavovic, Ilija and Leal, Isabel and Liang, Jacky and {Abou-Chakra}, Jad and Kim, Jaehyung and Drake, Jaimyn and Peters, Jan and Schneider, Jan and Hsu, Jasmine and Vakil, Jay and Bohg, Jeannette and Bingham, Jeffrey and Wu, Jeffrey and Gao, Jensen and Hu, Jiaheng and Wu, Jiajun and Wu, Jialin and Sun, Jiankai and Luo, Jianlan and Gu, Jiayuan and Tan, Jie and Oh, Jihoon and Wu, Jimmy and Lu, Jingpei and Yang, Jingyun and Malik, Jitendra and Silv{\'e}rio, Jo{\~a}o and Hejna, Joey and Booher, Jonathan and Tompson, Jonathan and Yang, Jonathan and Salvador, Jordi and Lim, Joseph J. and Han, Junhyek and Wang, Kaiyuan and Rao, Kanishka and Pertsch, Karl and Hausman, Karol and Go, Keegan and Gopalakrishnan, Keerthana and Goldberg, Ken and Byrne, Kendra and Oslund, Kenneth and Kawaharazuka, Kento and Black, Kevin and Lin, Kevin and Zhang, Kevin and Ehsani, Kiana and Lekkala, Kiran and Ellis, Kirsty and Rana, Krishan and Srinivasan, Krishnan and Fang, Kuan and Singh, Kunal Pratap and Zeng, Kuo-Hao and Hatch, Kyle and Hsu, Kyle and Itti, Laurent and Chen, Lawrence Yunliang and Pinto, Lerrel and {Fei-Fei}, Li and Tan, Liam and Fan, Linxi "Jim" and Ott, Lionel and Lee, Lisa and Weihs, Luca and Chen, Magnum and Lepert, Marion and Memmel, Marius and Tomizuka, Masayoshi and Itkina, Masha and Castro, Mateo Guaman and Spero, Max and Du, Maximilian and Ahn, Michael and Yip, Michael C. and Zhang, Mingtong and Ding, Mingyu and Heo, Minho and Srirama, Mohan Kumar and Sharma, Mohit and Kim, Moo Jin and Irshad, Muhammad Zubair and Kanazawa, Naoaki and Hansen, Nicklas and Heess, Nicolas and Joshi, Nikhil J. and Suenderhauf, Niko and Liu, Ning and Palo, Norman Di and Shafiullah, Nur Muhammad Mahi and Mees, Oier and Kroemer, Oliver and Bastani, Osbert and Sanketi, Pannag R. and Miller, Patrick "Tree" and Yin, Patrick and Wohlhart, Paul and Xu, Peng and Fagan, Peter David and Mitrano, Peter and Sermanet, Pierre and Abbeel, Pieter and Sundaresan, Priya and Chen, Qiuyu and Vuong, Quan and Rafailov, Rafael and Tian, Ran and Doshi, Ria and {Mart{\'i}n-Mart{\'i}n}, Roberto and Baijal, Rohan and Scalise, Rosario and Hendrix, Rose and Lin, Roy and Qian, Runjia and Zhang, Ruohan and Mendonca, Russell and Shah, Rutav and Hoque, Ryan and Julian, Ryan and Bustamante, Samuel and Kirmani, Sean and Levine, Sergey and Lin, Shan and Moore, Sherry and Bahl, Shikhar and Dass, Shivin and Sonawani, Shubham and Tulsiani, Shubham and Song, Shuran and Xu, Sichun and Haldar, Siddhant and Karamcheti, Siddharth and Adebola, Simeon and Guist, Simon and Nasiriany, Soroush and Schaal, Stefan and Welker, Stefan and Tian, Stephen and Ramamoorthy, Subramanian and Dasari, Sudeep and Belkhale, Suneel and Park, Sungjae and Nair, Suraj and Mirchandani, Suvir and Osa, Takayuki and Gupta, Tanmay and Harada, Tatsuya and Matsushima, Tatsuya and Xiao, Ted and Kollar, Thomas and Yu, Tianhe and Ding, Tianli and Davchev, Todor and Zhao, Tony Z. and Armstrong, Travis and Darrell, Trevor and Chung, Trinity and Jain, Vidhi and Kumar, Vikash and Vanhoucke, Vincent and Guizilini, Vitor and Zhan, Wei and Zhou, Wenxuan and Burgard, Wolfram and Chen, Xi and Chen, Xiangyu and Wang, Xiaolong and Zhu, Xinghao and Geng, Xinyang and Liu, Xiyuan and Liangwei, Xu and Li, Xuanlin and Pang, Yansong and Lu, Yao and Ma, Yecheng Jason and Kim, Yejin and Chebotar, Yevgen and Zhou, Yifan and Zhu, Yifeng and Wu, Yilin and Xu, Ying and Wang, Yixuan and Bisk, Yonatan and Dou, Yongqiang and Cho, Yoonyoung and Lee, Youngwoon and Cui, Yuchen and Cao, Yue and Wu, Yueh-Hua and Tang, Yujin and Zhu, Yuke and Zhang, Yunchu and Jiang, Yunfan and Li, Yunshuang and Li, Yunzhu and Iwasawa, Yusuke and Matsuo, Yutaka and Ma, Zehan and Xu, Zhuo and Cui, Zichen Jeff and Zhang, Zichen and Fu, Zipeng and Lin, Zipeng}, - year = {2025}, - month = may, - number = {arXiv:2310.08864}, - eprint = {2310.08864}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2310.08864}, - urldate = {2025-09-08}, - abstract = {Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/2U73MMVN/Collaboration et al. - 2025 - Open X-Embodiment Robotic Learning Datasets and RT-X Models.pdf;/Users/fracapuano/Zotero/storage/PX7IHY32/2310.html} -} - -@book{connellRobotLearning1993, - title = {Robot {{Learning}}}, - editor = {Connell, Jonathan H. and Mahadevan, Sridhar}, - year = {1993}, - publisher = {Springer US}, - address = {Boston, MA}, - doi = {10.1007/978-1-4615-3184-5}, - urldate = {2025-08-28}, - copyright = {http://www.springer.com/tdm}, - isbn = {978-1-4613-6396-5 978-1-4615-3184-5}, - keywords = {algorithms,artificial intelligence,artificial life,autonom,autonomous robot,genetic algorithms,intelligence,learning,Navigation,programming,proving,robot,uncertainty} -} - -@article{degraveMagneticControlTokamak2022, - title = {Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning}, - author = {Degrave, Jonas and Felici, Federico and Buchli, Jonas and Neunert, Michael and Tracey, Brendan and Carpanese, Francesco and Ewalds, Timo and Hafner, Roland and Abdolmaleki, Abbas and {de las Casas}, Diego and Donner, Craig and Fritz, Leslie and Galperti, Cristian and Huber, Andrea and Keeling, James and Tsimpoukelli, Maria and Kay, Jackie and Merle, Antoine and Moret, Jean-Marc and Noury, Seb and Pesamosca, Federico and Pfau, David and Sauter, Olivier and Sommariva, Cristian and Coda, Stefano and Duval, Basil and Fasoli, Ambrogio and Kohli, Pushmeet and Kavukcuoglu, Koray and Hassabis, Demis and Riedmiller, Martin}, - year = {2022}, - month = feb, - journal = {Nature}, - volume = {602}, - number = {7897}, - pages = {414--419}, - publisher = {Nature Publishing Group}, - issn = {1476-4687}, - doi = {10.1038/s41586-021-04301-9}, - urldate = {2025-08-31}, - abstract = {Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a promising path towards sustainable energy. A core challenge is to shape and maintain a high-temperature plasma within the tokamak vessel. This requires high-dimensional, high-frequency, closed-loop control using magnetic actuator coils, further complicated by the diverse requirements across a wide range of plasma configurations. In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. This architecture meets control objectives specified at a high level, at the same time satisfying physical and operational constraints. This approach has unprecedented flexibility and generality in problem specification and yields a notable reduction in design effort to produce new plasma configurations. We successfully produce and control a diverse set of plasma configurations on the Tokamak {\`a} Configuration Variable1,2, including elongated, conventional shapes, as well as advanced configurations, such as negative triangularity and `snowflake' configurations. Our approach achieves accurate tracking of the location, current and shape for these configurations. We also demonstrate sustained `droplets' on TCV, in which two separate plasmas are maintained simultaneously within the vessel. This represents a notable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain, and is one of the most challenging real-world systems to which reinforcement learning has been applied.}, - copyright = {2022 The Author(s)}, - langid = {english}, - keywords = {Computer science,Magnetically confined plasmas,Nuclear fusion and fission}, - file = {/Users/fracapuano/Zotero/storage/EZ4EAU84/Degrave et al. - 2022 - Magnetic control of tokamak plasmas through deep reinforcement learning.pdf} -} - -@misc{devlinBERTPretrainingDeep2019, - title = {{{BERT}}: {{Pre-training}} of {{Deep Bidirectional Transformers}} for {{Language Understanding}}}, - shorttitle = {{{BERT}}}, - author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, - year = {2019}, - month = may, - number = {arXiv:1810.04805}, - eprint = {1810.04805}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1810.04805}, - urldate = {2025-09-08}, - abstract = {We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5\% (7.7\% point absolute improvement), MultiNLI accuracy to 86.7\% (4.6\% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computation and Language}, - file = {/Users/fracapuano/Zotero/storage/AJ3SRLHF/Devlin et al. - 2019 - BERT Pre-training of Deep Bidirectional Transformers for Language Understanding.pdf;/Users/fracapuano/Zotero/storage/LNIKJNIW/1810.html} -} - -@misc{driessKnowledgeInsulatingVisionLanguageAction2025, - title = {Knowledge {{Insulating Vision-Language-Action Models}}: {{Train Fast}}, {{Run Fast}}, {{Generalize Better}}}, - shorttitle = {Knowledge {{Insulating Vision-Language-Action Models}}}, - author = {Driess, Danny and Springenberg, Jost Tobias and Ichter, Brian and Yu, Lili and {Li-Bell}, Adrian and Pertsch, Karl and Ren, Allen Z. and Walke, Homer and Vuong, Quan and Shi, Lucy Xiaoyang and Levine, Sergey}, - year = {2025}, - month = may, - number = {arXiv:2505.23705}, - eprint = {2505.23705}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2505.23705}, - urldate = {2025-09-09}, - abstract = {Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model (VLM) training. However, the constraints of real-time control are often at odds with the design of VLMs: the most powerful VLMs have tens or hundreds of billions of parameters, presenting an obstacle to real-time inference, and operate on discrete tokens rather than the continuous-valued outputs that are required for controlling robots. To address this challenge, recent VLA models have used specialized modules for efficient continuous control, such as action experts or continuous output heads, which typically require adding new untrained parameters to the pretrained VLM backbone. While these modules improve real-time and control capabilities, it remains an open question whether they preserve or degrade the semantic knowledge contained in the pretrained VLM, and what effect they have on the VLA training dynamics. In this paper, we study this question in the context of VLAs that include a continuous diffusion or flow matching action expert, showing that naively including such experts significantly harms both training speed and knowledge transfer. We provide an extensive analysis of various design choices, their impact on performance and knowledge transfer, and propose a technique for insulating the VLM backbone during VLA training that mitigates this issue. Videos are available at https://pi.website/research/knowledge\_insulation.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/QHTS9JIC/Driess et al. - 2025 - Knowledge Insulating Vision-Language-Action Models Train Fast, Run Fast, Generalize Better.pdf;/Users/fracapuano/Zotero/storage/3U9FCXRB/2505.html} -} - -@misc{driessPaLMEEmbodiedMultimodal2023, - title = {{{PaLM-E}}: {{An Embodied Multimodal Language Model}}}, - shorttitle = {{{PaLM-E}}}, - author = {Driess, Danny and Xia, Fei and Sajjadi, Mehdi S. M. and Lynch, Corey and Chowdhery, Aakanksha and Ichter, Brian and Wahid, Ayzaan and Tompson, Jonathan and Vuong, Quan and Yu, Tianhe and Huang, Wenlong and Chebotar, Yevgen and Sermanet, Pierre and Duckworth, Daniel and Levine, Sergey and Vanhoucke, Vincent and Hausman, Karol and Toussaint, Marc and Greff, Klaus and Zeng, Andy and Mordatch, Igor and Florence, Pete}, - year = {2023}, - month = mar, - number = {arXiv:2303.03378}, - eprint = {2303.03378}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2303.03378}, - urldate = {2025-09-07}, - abstract = {Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Input to our embodied language model are multi-modal sentences that interleave visual, continuous state estimation, and textual input encodings. We train these encodings end-to-end, in conjunction with a pre-trained large language model, for multiple embodied tasks including sequential robotic manipulation planning, visual question answering, and captioning. Our evaluations show that PaLM-E, a single large embodied multimodal model, can address a variety of embodied reasoning tasks, from a variety of observation modalities, on multiple embodiments, and further, exhibits positive transfer: the model benefits from diverse joint training across internet-scale language, vision, and visual-language domains. Our largest model, PaLM-E-562B with 562B parameters, in addition to being trained on robotics tasks, is a visual-language generalist with state-of-the-art performance on OK-VQA, and retains generalist language capabilities with increasing scale.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/PQSPI784/Driess et al. - 2023 - PaLM-E An Embodied Multimodal Language Model.pdf;/Users/fracapuano/Zotero/storage/K3PJVSGB/2303.html} -} - -@misc{esserScalingRectifiedFlow2024, - title = {Scaling {{Rectified Flow Transformers}} for {{High-Resolution Image Synthesis}}}, - author = {Esser, Patrick and Kulal, Sumith and Blattmann, Andreas and Entezari, Rahim and M{\"u}ller, Jonas and Saini, Harry and Levi, Yam and Lorenz, Dominik and Sauer, Axel and Boesel, Frederic and Podell, Dustin and Dockhorn, Tim and English, Zion and Lacey, Kyle and Goodwin, Alex and Marek, Yannik and Rombach, Robin}, - year = {2024}, - month = mar, - number = {arXiv:2403.03206}, - eprint = {2403.03206}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2403.03206}, - urldate = {2025-09-07}, - abstract = {Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. Through a large-scale study, we demonstrate the superior performance of this approach compared to established diffusion formulations for high-resolution text-to-image synthesis. Additionally, we present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens, improving text comprehension, typography, and human preference ratings. We demonstrate that this architecture follows predictable scaling trends and correlates lower validation loss to improved text-to-image synthesis as measured by various metrics and human evaluations. Our largest models outperform state-of-the-art models, and we will make our experimental data, code, and model weights publicly available.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/23TGK9JM/Esser et al. - 2024 - Scaling Rectified Flow Transformers for High-Resolution Image Synthesis.pdf;/Users/fracapuano/Zotero/storage/W2CRYPZY/2403.html} -} - -@misc{fedusReviewSparseExpert2022, - title = {A {{Review}} of {{Sparse Expert Models}} in {{Deep Learning}}}, - author = {Fedus, William and Dean, Jeff and Zoph, Barret}, - year = {2022}, - month = sep, - number = {arXiv:2209.01667}, - eprint = {2209.01667}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2209.01667}, - urldate = {2025-09-08}, - abstract = {Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning. This class of architecture encompasses Mixture-of-Experts, Switch Transformers, Routing Networks, BASE layers, and others, all with the unifying idea that each example is acted on by a subset of the parameters. By doing so, the degree of sparsity decouples the parameter count from the compute per example allowing for extremely large, but efficient models. The resulting models have demonstrated significant improvements across diverse domains such as natural language processing, computer vision, and speech recognition. We review the concept of sparse expert models, provide a basic description of the common algorithms, contextualize the advances in the deep learning era, and conclude by highlighting areas for future work.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/MZXG2WMJ/Fedus et al. - 2022 - A Review of Sparse Expert Models in Deep Learning.pdf;/Users/fracapuano/Zotero/storage/GLZINJYC/2209.html} -} - -@misc{finiMultimodalAutoregressivePretraining2024, - title = {Multimodal {{Autoregressive Pre-training}} of {{Large Vision Encoders}}}, - author = {Fini, Enrico and Shukor, Mustafa and Li, Xiujun and Dufter, Philipp and Klein, Michal and Haldimann, David and Aitharaju, Sai and da Costa, Victor Guilherme Turrisi and B{\'e}thune, Louis and Gan, Zhe and Toshev, Alexander T. and Eichner, Marcin and Nabi, Moin and Yang, Yinfei and Susskind, Joshua M. and {El-Nouby}, Alaaeldin}, - year = {2024}, - month = nov, - number = {arXiv:2411.14402}, - eprint = {2411.14402}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2411.14402}, - urldate = {2025-09-09}, - abstract = {We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we present AIMV2, a family of generalist vision encoders characterized by a straightforward pre-training process, scalability, and remarkable performance across a range of downstream tasks. This is achieved by pairing the vision encoder with a multimodal decoder that autoregressively generates raw image patches and text tokens. Our encoders excel not only in multimodal evaluations but also in vision benchmarks such as localization, grounding, and classification. Notably, our AIMV2-3B encoder achieves 89.5\% accuracy on ImageNet-1k with a frozen trunk. Furthermore, AIMV2 consistently outperforms state-of-the-art contrastive models (e.g., CLIP, SigLIP) in multimodal image understanding across diverse settings.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/ULTX55I6/Fini et al. - 2024 - Multimodal Autoregressive Pre-training of Large Vision Encoders.pdf;/Users/fracapuano/Zotero/storage/SUG2W6A9/2411.html} -} - -@inproceedings{florenceImplicitBehavioralCloning2022, - title = {Implicit {{Behavioral Cloning}}}, - booktitle = {Proceedings of the 5th {{Conference}} on {{Robot Learning}}}, - author = {Florence, Pete and Lynch, Corey and Zeng, Andy and Ramirez, Oscar A. and Wahid, Ayzaan and Downs, Laura and Wong, Adrian and Lee, Johnny and Mordatch, Igor and Tompson, Jonathan}, - year = {2022}, - month = jan, - pages = {158--168}, - publisher = {PMLR}, - issn = {2640-3498}, - urldate = {2025-09-01}, - abstract = {We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavior-cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavior-cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/Q8I5E862/Florence et al. - 2022 - Implicit Behavioral Cloning.pdf} -} - -@misc{FROMAGe, - title = {Grounding Language Models to Images for Multimodal Inputs and Outputs}, - author = {Koh, Jing Yu and Salakhutdinov, Ruslan and Fried, Daniel}, - year = {2023} -} - -@article{fujitaDevelopmentRobotsNuclear2020, - title = {Development of {{Robots}} for {{Nuclear Power Plants}} and {{Their Application}} to {{New Fields}}}, - author = {Fujita, Jun and Soda, Daisuke and Murata, Chotaro and Tsuhari, Hiroyuki}, - year = {2020}, - volume = {57}, - number = {4}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/K349QTEG/Fujita et al. - 2020 - Development of Robots for Nuclear Power Plants and Their Application to New Fields.pdf} -} - -@misc{grattafioriLlama3Herd2024, - title = {The {{Llama}} 3 {{Herd}} of {{Models}}}, - author = {Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and {Al-Dahle}, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and Yang, Amy and Fan, Angela and Goyal, Anirudh and Hartshorn, Anthony and Yang, Aobo and Mitra, Archi and Sravankumar, Archie and Korenev, Artem and Hinsvark, Arthur and Rao, Arun and Zhang, Aston and Rodriguez, Aurelien and Gregerson, Austen and Spataru, Ava and Roziere, Baptiste and Biron, Bethany and Tang, Binh and Chern, Bobbie and Caucheteux, Charlotte and Nayak, Chaya and Bi, Chloe and Marra, Chris and McConnell, Chris and Keller, Christian and Touret, Christophe and Wu, Chunyang and Wong, Corinne and Ferrer, Cristian Canton and Nikolaidis, Cyrus and Allonsius, Damien and Song, Daniel and Pintz, Danielle and Livshits, Danny and Wyatt, Danny and Esiobu, David and Choudhary, Dhruv and Mahajan, Dhruv and {Garcia-Olano}, Diego and Perino, Diego and Hupkes, Dieuwke and Lakomkin, Egor and AlBadawy, Ehab and Lobanova, Elina and Dinan, Emily and Smith, Eric Michael and Radenovic, Filip and Guzm{\'a}n, Francisco and Zhang, Frank and Synnaeve, Gabriel and Lee, Gabrielle and Anderson, Georgia Lewis and Thattai, Govind and Nail, Graeme and Mialon, Gregoire and Pang, Guan and Cucurell, Guillem and Nguyen, Hailey and Korevaar, Hannah and Xu, Hu and Touvron, Hugo and Zarov, Iliyan and Ibarra, Imanol Arrieta and Kloumann, Isabel and Misra, Ishan and Evtimov, Ivan and Zhang, Jack and Copet, Jade and Lee, Jaewon and Geffert, Jan and Vranes, Jana and Park, Jason and Mahadeokar, Jay and Shah, Jeet and van der Linde, Jelmer and Billock, Jennifer and Hong, Jenny and Lee, Jenya and Fu, Jeremy and Chi, Jianfeng and Huang, Jianyu and Liu, Jiawen and Wang, Jie and Yu, Jiecao and Bitton, Joanna and Spisak, Joe and Park, Jongsoo and Rocca, Joseph and Johnstun, Joshua and Saxe, Joshua and Jia, Junteng and Alwala, Kalyan Vasuden and Prasad, Karthik and Upasani, Kartikeya and Plawiak, Kate and Li, Ke and Heafield, Kenneth and Stone, Kevin and {El-Arini}, Khalid and Iyer, Krithika and Malik, Kshitiz and Chiu, Kuenley and Bhalla, Kunal and Lakhotia, Kushal and {Rantala-Yeary}, Lauren and van der Maaten, Laurens and Chen, Lawrence and Tan, Liang and Jenkins, Liz and Martin, Louis and Madaan, Lovish and Malo, Lubo and Blecher, Lukas and Landzaat, Lukas and de Oliveira, Luke and Muzzi, Madeline and Pasupuleti, Mahesh and Singh, Mannat and Paluri, Manohar and Kardas, Marcin and Tsimpoukelli, Maria and Oldham, Mathew and Rita, Mathieu and Pavlova, Maya and Kambadur, Melanie and Lewis, Mike and Si, Min and Singh, Mitesh Kumar and Hassan, Mona and Goyal, Naman and Torabi, Narjes and Bashlykov, Nikolay and Bogoychev, Nikolay and Chatterji, Niladri and Zhang, Ning and Duchenne, Olivier and {\c C}elebi, Onur and Alrassy, Patrick and Zhang, Pengchuan and Li, Pengwei and Vasic, Petar and Weng, Peter and Bhargava, Prajjwal and Dubal, Pratik and Krishnan, Praveen and Koura, Punit Singh and Xu, Puxin and He, Qing and Dong, Qingxiao and Srinivasan, Ragavan and Ganapathy, Raj and Calderer, Ramon and Cabral, Ricardo Silveira and Stojnic, Robert and Raileanu, Roberta and Maheswari, Rohan and Girdhar, Rohit and Patel, Rohit and Sauvestre, Romain and Polidoro, Ronnie and Sumbaly, Roshan and Taylor, Ross and Silva, Ruan and Hou, Rui and Wang, Rui and Hosseini, Saghar and Chennabasappa, Sahana and Singh, Sanjay and Bell, Sean and Kim, Seohyun Sonia and Edunov, Sergey and Nie, Shaoliang and Narang, Sharan and Raparthy, Sharath and Shen, Sheng and Wan, Shengye and Bhosale, Shruti and Zhang, Shun and Vandenhende, Simon and Batra, Soumya and Whitman, Spencer and Sootla, Sten and Collot, Stephane and Gururangan, Suchin and Borodinsky, Sydney and Herman, Tamar and Fowler, Tara and Sheasha, Tarek and Georgiou, Thomas and Scialom, Thomas and Speckbacher, Tobias and Mihaylov, Todor and Xiao, Tong and Karn, Ujjwal and Goswami, Vedanuj and Gupta, Vibhor and Ramanathan, Vignesh and Kerkez, Viktor and Gonguet, Vincent and Do, Virginie and Vogeti, Vish and Albiero, V{\'i}tor and Petrovic, Vladan and Chu, Weiwei and Xiong, Wenhan and Fu, Wenyin and Meers, Whitney and Martinet, Xavier and Wang, Xiaodong and Wang, Xiaofang and Tan, Xiaoqing Ellen and Xia, Xide and Xie, Xinfeng and Jia, Xuchao and Wang, Xuewei and Goldschlag, Yaelle and Gaur, Yashesh and Babaei, Yasmine and Wen, Yi and Song, Yiwen and Zhang, Yuchen and Li, Yue and Mao, Yuning and Coudert, Zacharie Delpierre and Yan, Zheng and Chen, Zhengxing and Papakipos, Zoe and Singh, Aaditya and Srivastava, Aayushi and Jain, Abha and Kelsey, Adam and Shajnfeld, Adam and Gangidi, Adithya and Victoria, Adolfo and Goldstand, Ahuva and Menon, Ajay and Sharma, Ajay and Boesenberg, Alex and Baevski, Alexei and Feinstein, Allie and Kallet, Amanda and Sangani, Amit and Teo, Amos and Yunus, Anam and Lupu, Andrei and Alvarado, Andres and Caples, Andrew and Gu, Andrew and Ho, Andrew and Poulton, Andrew and Ryan, Andrew and Ramchandani, Ankit and Dong, Annie and Franco, Annie and Goyal, Anuj and Saraf, Aparajita and Chowdhury, Arkabandhu and Gabriel, Ashley and Bharambe, Ashwin and Eisenman, Assaf and Yazdan, Azadeh and James, Beau and Maurer, Ben and Leonhardi, Benjamin and Huang, Bernie and Loyd, Beth and Paola, Beto De and Paranjape, Bhargavi and Liu, Bing and Wu, Bo and Ni, Boyu and Hancock, Braden and Wasti, Bram and Spence, Brandon and Stojkovic, Brani and Gamido, Brian and Montalvo, Britt and Parker, Carl and Burton, Carly and Mejia, Catalina and Liu, Ce and Wang, Changhan and Kim, Changkyu and Zhou, Chao and Hu, Chester and Chu, Ching-Hsiang and Cai, Chris and Tindal, Chris and Feichtenhofer, Christoph and Gao, Cynthia and Civin, Damon and Beaty, Dana and Kreymer, Daniel and Li, Daniel and Adkins, David and Xu, David and Testuggine, Davide and David, Delia and Parikh, Devi and Liskovich, Diana and Foss, Didem and Wang, Dingkang and Le, Duc and Holland, Dustin and Dowling, Edward and Jamil, Eissa and Montgomery, Elaine and Presani, Eleonora and Hahn, Emily and Wood, Emily and Le, Eric-Tuan and Brinkman, Erik and Arcaute, Esteban and Dunbar, Evan and Smothers, Evan and Sun, Fei and Kreuk, Felix and Tian, Feng and Kokkinos, Filippos and Ozgenel, Firat and Caggioni, Francesco and Kanayet, Frank and Seide, Frank and Florez, Gabriela Medina and Schwarz, Gabriella and Badeer, Gada and Swee, Georgia and Halpern, Gil and Herman, Grant and Sizov, Grigory and Guangyi and Zhang and Lakshminarayanan, Guna and Inan, Hakan and Shojanazeri, Hamid and Zou, Han and Wang, Hannah and Zha, Hanwen and Habeeb, Haroun and Rudolph, Harrison and Suk, Helen and Aspegren, Henry and Goldman, Hunter and Zhan, Hongyuan and Damlaj, Ibrahim and Molybog, Igor and Tufanov, Igor and Leontiadis, Ilias and Veliche, Irina-Elena and Gat, Itai and Weissman, Jake and Geboski, James and Kohli, James and Lam, Janice and Asher, Japhet and Gaya, Jean-Baptiste and Marcus, Jeff and Tang, Jeff and Chan, Jennifer and Zhen, Jenny and Reizenstein, Jeremy and Teboul, Jeremy and Zhong, Jessica and Jin, Jian and Yang, Jingyi and Cummings, Joe and Carvill, Jon and Shepard, Jon and McPhie, Jonathan and Torres, Jonathan and Ginsburg, Josh and Wang, Junjie and Wu, Kai and U, Kam Hou and Saxena, Karan and Khandelwal, Kartikay and Zand, Katayoun and Matosich, Kathy and Veeraraghavan, Kaushik and Michelena, Kelly and Li, Keqian and Jagadeesh, Kiran and Huang, Kun and Chawla, Kunal and Huang, Kyle and Chen, Lailin and Garg, Lakshya and A, Lavender and Silva, Leandro and Bell, Lee and Zhang, Lei and Guo, Liangpeng and Yu, Licheng and Moshkovich, Liron and Wehrstedt, Luca and Khabsa, Madian and Avalani, Manav and Bhatt, Manish and Mankus, Martynas and Hasson, Matan and Lennie, Matthew and Reso, Matthias and Groshev, Maxim and Naumov, Maxim and Lathi, Maya and Keneally, Meghan and Liu, Miao and Seltzer, Michael L. and Valko, Michal and Restrepo, Michelle and Patel, Mihir and Vyatskov, Mik and Samvelyan, Mikayel and Clark, Mike and Macey, Mike and Wang, Mike and Hermoso, Miquel Jubert and Metanat, Mo and Rastegari, Mohammad and Bansal, Munish and Santhanam, Nandhini and Parks, Natascha and White, Natasha and Bawa, Navyata and Singhal, Nayan and Egebo, Nick and Usunier, Nicolas and Mehta, Nikhil and Laptev, Nikolay Pavlovich and Dong, Ning and Cheng, Norman and Chernoguz, Oleg and Hart, Olivia and Salpekar, Omkar and Kalinli, Ozlem and Kent, Parkin and Parekh, Parth and Saab, Paul and Balaji, Pavan and Rittner, Pedro and Bontrager, Philip and Roux, Pierre and Dollar, Piotr and Zvyagina, Polina and Ratanchandani, Prashant and Yuvraj, Pritish and Liang, Qian and Alao, Rachad and Rodriguez, Rachel and Ayub, Rafi and Murthy, Raghotham and Nayani, Raghu and Mitra, Rahul and Parthasarathy, Rangaprabhu and Li, Raymond and Hogan, Rebekkah and Battey, Robin and Wang, Rocky and Howes, Russ and Rinott, Ruty and Mehta, Sachin and Siby, Sachin and Bondu, Sai Jayesh and Datta, Samyak and Chugh, Sara and Hunt, Sara and Dhillon, Sargun and Sidorov, Sasha and Pan, Satadru and Mahajan, Saurabh and Verma, Saurabh and Yamamoto, Seiji and Ramaswamy, Sharadh and Lindsay, Shaun and Lindsay, Shaun and Feng, Sheng and Lin, Shenghao and Zha, Shengxin Cindy and Patil, Shishir and Shankar, Shiva and Zhang, Shuqiang and Zhang, Shuqiang and Wang, Sinong and Agarwal, Sneha and Sajuyigbe, Soji and Chintala, Soumith and Max, Stephanie and Chen, Stephen and Kehoe, Steve and Satterfield, Steve and Govindaprasad, Sudarshan and Gupta, Sumit and Deng, Summer and Cho, Sungmin and Virk, Sunny and Subramanian, Suraj and Choudhury, Sy and Goldman, Sydney and Remez, Tal and Glaser, Tamar and Best, Tamara and Koehler, Thilo and Robinson, Thomas and Li, Tianhe and Zhang, Tianjun and Matthews, Tim and Chou, Timothy and Shaked, Tzook and Vontimitta, Varun and Ajayi, Victoria and Montanez, Victoria and Mohan, Vijai and Kumar, Vinay Satish and Mangla, Vishal and Ionescu, Vlad and Poenaru, Vlad and Mihailescu, Vlad Tiberiu and Ivanov, Vladimir and Li, Wei and Wang, Wenchen and Jiang, Wenwen and Bouaziz, Wes and Constable, Will and Tang, Xiaocheng and Wu, Xiaojian and Wang, Xiaolan and Wu, Xilun and Gao, Xinbo and Kleinman, Yaniv and Chen, Yanjun and Hu, Ye and Jia, Ye and Qi, Ye and Li, Yenda and Zhang, Yilin and Zhang, Ying and Adi, Yossi and Nam, Youngjin and Yu and Wang and Zhao, Yu and Hao, Yuchen and Qian, Yundi and Li, Yunlu and He, Yuzi and Rait, Zach and DeVito, Zachary and Rosnbrick, Zef and Wen, Zhaoduo and Yang, Zhenyu and Zhao, Zhiwei and Ma, Zhiyu}, - year = {2024}, - month = nov, - number = {arXiv:2407.21783}, - eprint = {2407.21783}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2407.21783}, - urldate = {2025-09-09}, - abstract = {Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/88PJ48EN/Grattafiori et al. - 2024 - The Llama 3 Herd of Models.pdf;/Users/fracapuano/Zotero/storage/2LLAWX8L/2407.html} -} - -@inproceedings{griffinWalkingStabilizationUsing2017, - title = {Walking {{Stabilization Using Step Timing}} and {{Location Adjustment}} on the {{Humanoid Robot}}, {{Atlas}}}, - booktitle = {2017 {{IEEE}}/{{RSJ International Conference}} on {{Intelligent Robots}} and {{Systems}} ({{IROS}})}, - author = {Griffin, Robert J. and Wiedebach, Georg and Bertrand, Sylvain and Leonessa, Alexander and Pratt, Jerry}, - year = {2017}, - month = sep, - eprint = {1703.00477}, - primaryclass = {cs}, - pages = {667--673}, - doi = {10.1109/IROS.2017.8202223}, - urldate = {2025-08-26}, - abstract = {While humans are highly capable of recovering from external disturbances and uncertainties that result in large tracking errors, humanoid robots have yet to reliably mimic this level of robustness. Essential to this is the ability to combine traditional "ankle strategy" balancing with step timing and location adjustment techniques. In doing so, the robot is able to step quickly to the necessary location to continue walking. In this work, we present both a new swing speed up algorithm to adjust the step timing, allowing the robot to set the foot down more quickly to recover from errors in the direction of the current capture point dynamics, and a new algorithm to adjust the desired footstep, expanding the base of support to utilize the center of pressure (CoP)-based ankle strategy for balance. We then utilize the desired centroidal moment pivot (CMP) to calculate the momentum rate of change for our inverse-dynamics based whole-body controller. We present simulation and experimental results using this work, and discuss performance limitations and potential improvements.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/SSNAZ6U4/Griffin et al. - 2017 - Walking Stabilization Using Step Timing and Location Adjustment on the Humanoid Robot, Atlas.pdf;/Users/fracapuano/Zotero/storage/VP885PA9/1703.html} -} - -@misc{haarnojaReinforcementLearningDeep2017, - title = {Reinforcement {{Learning}} with {{Deep Energy-Based Policies}}}, - author = {Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey}, - year = {2017}, - month = jul, - number = {arXiv:1702.08165}, - eprint = {1702.08165}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1702.08165}, - urldate = {2025-08-31}, - abstract = {We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/PXCR4TCT/Haarnoja et al. - 2017 - Reinforcement Learning with Deep Energy-Based Policies.pdf;/Users/fracapuano/Zotero/storage/VUXXX9B7/1702.html} -} - -@misc{haarnojaReinforcementLearningDeep2017a, - title = {Reinforcement {{Learning}} with {{Deep Energy-Based Policies}}}, - author = {Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey}, - year = {2017}, - month = jul, - number = {arXiv:1702.08165}, - eprint = {1702.08165}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1702.08165}, - urldate = {2025-08-31}, - abstract = {We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/T84UBYDJ/Haarnoja et al. - 2017 - Reinforcement Learning with Deep Energy-Based Policies.pdf;/Users/fracapuano/Zotero/storage/53SJ2ED8/1702.html} -} - -@inproceedings{haarnojaReinforcementLearningDeep2017b, - title = {Reinforcement {{Learning}} with {{Deep Energy-Based Policies}}}, - booktitle = {Proceedings of the 34th {{International Conference}} on {{Machine Learning}}}, - author = {Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey}, - year = {2017}, - month = jul, - pages = {1352--1361}, - publisher = {PMLR}, - issn = {2640-3498}, - urldate = {2025-08-31}, - abstract = {We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/C59BJ4GU/Haarnoja et al. - 2017 - Reinforcement Learning with Deep Energy-Based Policies.pdf} -} - -@misc{haarnojaSoftActorCriticOffPolicy2018, - title = {Soft {{Actor-Critic}}: {{Off-Policy Maximum Entropy Deep Reinforcement Learning}} with a {{Stochastic Actor}}}, - shorttitle = {Soft {{Actor-Critic}}}, - author = {Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey}, - year = {2018}, - month = aug, - number = {arXiv:1801.01290}, - eprint = {1801.01290}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1801.01290}, - urldate = {2025-08-29}, - abstract = {Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/HG6UQIRM/Haarnoja et al. - 2018 - Soft Actor-Critic Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.pdf;/Users/fracapuano/Zotero/storage/RKG3J7MX/1801.html} -} - -@misc{hansenTemporalDifferenceLearning2022, - title = {Temporal {{Difference Learning}} for {{Model Predictive Control}}}, - author = {Hansen, Nicklas and Wang, Xiaolong and Su, Hao}, - year = {2022}, - month = jul, - number = {arXiv:2203.04955}, - eprint = {2203.04955}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2203.04955}, - urldate = {2025-08-25}, - abstract = {Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and video results are available at https://nicklashansen.github.io/td-mpc.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/TZF8LCDG/Hansen et al. - 2022 - Temporal Difference Learning for Model Predictive Control.pdf;/Users/fracapuano/Zotero/storage/WU2WWWQE/2203.html} -} - -@misc{heessEmergenceLocomotionBehaviours2017, - title = {Emergence of {{Locomotion Behaviours}} in {{Rich Environments}}}, - author = {Heess, Nicolas and TB, Dhruva and Sriram, Srinivasan and Lemmon, Jay and Merel, Josh and Wayne, Greg and Tassa, Yuval and Erez, Tom and Wang, Ziyu and Eslami, S. M. Ali and Riedmiller, Martin and Silver, David}, - year = {2017}, - month = jul, - number = {arXiv:1707.02286}, - eprint = {1707.02286}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1707.02286}, - urldate = {2025-09-02}, - abstract = {The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion -- behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed following https://youtu.be/hx\_bgoTF7bs .}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence}, - file = {/Users/fracapuano/Zotero/storage/9DZ8XEVY/Heess et al. - 2017 - Emergence of Locomotion Behaviours in Rich Environments.pdf;/Users/fracapuano/Zotero/storage/JUB2Q3WH/1707.html} -} - -@inproceedings{higgins2017beta, - title = {Beta-Vae: {{Learning}} Basic Visual Concepts with a Constrained Variational Framework}, - booktitle = {International Conference on Learning Representations}, - author = {Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander}, - year = {2017} -} - -@misc{hoDenoisingDiffusionProbabilistic2020, - title = {Denoising {{Diffusion Probabilistic Models}}}, - author = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter}, - year = {2020}, - month = dec, - number = {arXiv:2006.11239}, - eprint = {2006.11239}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2006.11239}, - urldate = {2025-09-03}, - abstract = {We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/DE655AYQ/Ho et al. - 2020 - Denoising Diffusion Probabilistic Models.pdf;/Users/fracapuano/Zotero/storage/NVIS47ZH/2006.html} -} - -@article{hwangboLearningAgileDynamic2019, - title = {Learning Agile and Dynamic Motor Skills for Legged Robots}, - author = {Hwangbo, Jemin and Lee, Joonho and Dosovitskiy, Alexey and Bellicoso, Dario and Tsounis, Vassilios and Koltun, Vladlen and Hutter, Marco}, - year = {2019}, - month = jan, - journal = {Science Robotics}, - volume = {4}, - number = {26}, - pages = {eaau5872}, - publisher = {American Association for the Advancement of Science}, - doi = {10.1126/scirobotics.aau5872}, - urldate = {2025-08-27}, - abstract = {Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog--sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.}, - file = {/Users/fracapuano/Zotero/storage/9V3X2F7R/Hwangbo et al. - 2019 - Learning agile and dynamic motor skills for legged robots.pdf} -} - -@inproceedings{ImageNet_VSS09, - title = {Construction and Analysis of a Large Scale Image Ontology}, - author = {Deng, J. and Li, K. and Do, M. and Su, H. and {Fei-Fei}, L.}, - year = {2009}, - publisher = {Vision Sciences Society} -} - -@inproceedings{InstructBLIP, - title = {{{InstructBLIP}}: {{Towards}} General-Purpose Vision-Language Models with Instruction Tuning}, - booktitle = {Thirty-Seventh Conference on Neural Information Processing Systems}, - author = {Dai, Wenliang and Li, Junnan and Li, Dongxu and Tiong, Anthony and Zhao, Junqi and Wang, Weisheng and Li, Boyang and Fung, Pascale and Hoi, Steven}, - year = {2023} -} - -@misc{jangBCZZeroShotTask2022, - title = {{{BC-Z}}: {{Zero-Shot Task Generalization}} with {{Robotic Imitation Learning}}}, - shorttitle = {{{BC-Z}}}, - author = {Jang, Eric and Irpan, Alex and Khansari, Mohi and Kappler, Daniel and Ebert, Frederik and Lynch, Corey and Levine, Sergey and Finn, Chelsea}, - year = {2022}, - month = feb, - number = {arXiv:2202.02005}, - eprint = {2202.02005}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2202.02005}, - urldate = {2025-09-01}, - abstract = {In this paper, we study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks, a long-standing challenge in robot learning. We approach the challenge from an imitation learning perspective, aiming to study how scaling and broadening the data collected can facilitate such generalization. To that end, we develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions and can be conditioned on different forms of information that convey the task, including pre-trained embeddings of natural language or videos of humans performing the task. When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44\%, without any robot demonstrations for those tasks.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/YDG2WMDC/Jang et al. - 2022 - BC-Z Zero-Shot Task Generalization with Robotic Imitation Learning.pdf;/Users/fracapuano/Zotero/storage/ZZ47RG6V/2202.html} -} - -@misc{jannerPlanningDiffusionFlexible2022, - title = {Planning with {{Diffusion}} for {{Flexible Behavior Synthesis}}}, - author = {Janner, Michael and Du, Yilun and Tenenbaum, Joshua B. and Levine, Sergey}, - year = {2022}, - month = dec, - number = {arXiv:2205.09991}, - eprint = {2205.09991}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2205.09991}, - urldate = {2025-09-03}, - abstract = {Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/6S28T733/Janner et al. - 2022 - Planning with Diffusion for Flexible Behavior Synthesis.pdf;/Users/fracapuano/Zotero/storage/DRH9ZWCG/2205.html} -} - -@misc{jiangMistral7B2023, - title = {Mistral {{7B}}}, - author = {Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and Lavaud, L{\'e}lio Renard and Lachaux, Marie-Anne and Stock, Pierre and Scao, Teven Le and Lavril, Thibaut and Wang, Thomas and Lacroix, Timoth{\'e}e and Sayed, William El}, - year = {2023}, - month = oct, - number = {arXiv:2310.06825}, - eprint = {2310.06825}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2310.06825}, - urldate = {2025-09-09}, - abstract = {We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/JJX9Q8J9/Jiang et al. - 2023 - Mistral 7B.pdf;/Users/fracapuano/Zotero/storage/WTMQBRW3/2310.html} -} - -@misc{jiDribbleBotDynamicLegged2023, - title = {{{DribbleBot}}: {{Dynamic Legged Manipulation}} in the {{Wild}}}, - shorttitle = {{{DribbleBot}}}, - author = {Ji, Yandong and Margolis, Gabriel B. and Agrawal, Pulkit}, - year = {2023}, - month = apr, - number = {arXiv:2304.01159}, - eprint = {2304.01159}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2304.01159}, - urldate = {2025-08-26}, - abstract = {DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i.e., in-the-wild). We adopt the paradigm of training policies in simulation using reinforcement learning and transferring them into the real world. We overcome critical challenges of accounting for variable ball motion dynamics on different terrains and perceiving the ball using body-mounted cameras under the constraints of onboard computing. Our results provide evidence that current quadruped platforms are well-suited for studying dynamic whole-body control problems involving simultaneous locomotion and manipulation directly from sensory observations.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/ABSRE4C4/Ji et al. - 2023 - DribbleBot Dynamic Legged Manipulation in the Wild.pdf;/Users/fracapuano/Zotero/storage/ADI4QNCY/2304.html} -} - -@misc{kakaobrain2022coyo700m, - title = {{{COYO-700M}}: {{Image-text}} Pair Dataset}, - author = {Byeon, Minwoo and Park, Beomhee and Kim, Haecheon and Lee, Sungjun and Baek, Woonhyuk and Kim, Saehoon}, - year = {2022} -} - -@misc{kaplanScalingLawsNeural2020, - title = {Scaling {{Laws}} for {{Neural Language Models}}}, - author = {Kaplan, Jared and McCandlish, Sam and Henighan, Tom and Brown, Tom B. and Chess, Benjamin and Child, Rewon and Gray, Scott and Radford, Alec and Wu, Jeffrey and Amodei, Dario}, - year = {2020}, - month = jan, - number = {arXiv:2001.08361}, - eprint = {2001.08361}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2001.08361}, - urldate = {2025-09-07}, - abstract = {We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/MI5AGWBH/Kaplan et al. - 2020 - Scaling Laws for Neural Language Models.pdf;/Users/fracapuano/Zotero/storage/SBZT8DDY/2001.html} -} - -@misc{keGraspingChopsticksCombating2020, - title = {Grasping with {{Chopsticks}}: {{Combating Covariate Shift}} in {{Model-free Imitation Learning}} for {{Fine Manipulation}}}, - shorttitle = {Grasping with {{Chopsticks}}}, - author = {Ke, Liyiming and Wang, Jingqiang and Bhattacharjee, Tapomayukh and Boots, Byron and Srinivasa, Siddhartha}, - year = {2020}, - month = nov, - number = {arXiv:2011.06719}, - eprint = {2011.06719}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2011.06719}, - urldate = {2025-09-01}, - abstract = {Billions of people use chopsticks, a simple yet versatile tool, for fine manipulation of everyday objects. The small, curved, and slippery tips of chopsticks pose a challenge for picking up small objects, making them a suitably complex test case. This paper leverages human demonstrations to develop an autonomous chopsticks-equipped robotic manipulator. Due to the lack of accurate models for fine manipulation, we explore model-free imitation learning, which traditionally suffers from the covariate shift phenomenon that causes poor generalization. We propose two approaches to reduce covariate shift, neither of which requires access to an interactive expert or a model, unlike previous approaches. First, we alleviate single-step prediction errors by applying an invariant operator to increase the data support at critical steps for grasping. Second, we generate synthetic corrective labels by adding bounded noise and combining parametric and non-parametric methods to prevent error accumulation. We demonstrate our methods on a real chopstick-equipped robot that we built, and observe the agent's success rate increase from 37.3\% to 80\%, which is comparable to the human expert performance of 82.6\%.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/ZUPECLSW/Ke et al. - 2020 - Grasping with Chopsticks Combating Covariate Shift in Model-free Imitation Learning for Fine Manipu.pdf;/Users/fracapuano/Zotero/storage/X7PX638S/2011.html} -} - -@article{khatibRealTimeObstancleAvoidance1986, - title = {Real-{{Time Obstancle Avoidance}} for {{Manipulators}} and {{Mobile Robots}}}, - author = {Khatib, Oussama}, - year = {1986}, - journal = {The International Journal of Robotics Research}, - volume = {5} -} - -@misc{khazatskyDROIDLargeScaleInTheWild2025, - title = {{{DROID}}: {{A Large-Scale In-The-Wild Robot Manipulation Dataset}}}, - shorttitle = {{{DROID}}}, - author = {Khazatsky, Alexander and Pertsch, Karl and Nair, Suraj and Balakrishna, Ashwin and Dasari, Sudeep and Karamcheti, Siddharth and Nasiriany, Soroush and Srirama, Mohan Kumar and Chen, Lawrence Yunliang and Ellis, Kirsty and Fagan, Peter David and Hejna, Joey and Itkina, Masha and Lepert, Marion and Ma, Yecheng Jason and Miller, Patrick Tree and Wu, Jimmy and Belkhale, Suneel and Dass, Shivin and Ha, Huy and Jain, Arhan and Lee, Abraham and Lee, Youngwoon and Memmel, Marius and Park, Sungjae and Radosavovic, Ilija and Wang, Kaiyuan and Zhan, Albert and Black, Kevin and Chi, Cheng and Hatch, Kyle Beltran and Lin, Shan and Lu, Jingpei and Mercat, Jean and Rehman, Abdul and Sanketi, Pannag R. and Sharma, Archit and Simpson, Cody and Vuong, Quan and Walke, Homer Rich and Wulfe, Blake and Xiao, Ted and Yang, Jonathan Heewon and Yavary, Arefeh and Zhao, Tony Z. and Agia, Christopher and Baijal, Rohan and Castro, Mateo Guaman and Chen, Daphne and Chen, Qiuyu and Chung, Trinity and Drake, Jaimyn and Foster, Ethan Paul and Gao, Jensen and Guizilini, Vitor and Herrera, David Antonio and Heo, Minho and Hsu, Kyle and Hu, Jiaheng and Irshad, Muhammad Zubair and Jackson, Donovon and Le, Charlotte and Li, Yunshuang and Lin, Kevin and Lin, Roy and Ma, Zehan and Maddukuri, Abhiram and Mirchandani, Suvir and Morton, Daniel and Nguyen, Tony and O'Neill, Abigail and Scalise, Rosario and Seale, Derick and Son, Victor and Tian, Stephen and Tran, Emi and Wang, Andrew E. and Wu, Yilin and Xie, Annie and Yang, Jingyun and Yin, Patrick and Zhang, Yunchu and Bastani, Osbert and Berseth, Glen and Bohg, Jeannette and Goldberg, Ken and Gupta, Abhinav and Gupta, Abhishek and Jayaraman, Dinesh and Lim, Joseph J. and Malik, Jitendra and {Mart{\'i}n-Mart{\'i}n}, Roberto and Ramamoorthy, Subramanian and Sadigh, Dorsa and Song, Shuran and Wu, Jiajun and Yip, Michael C. and Zhu, Yuke and Kollar, Thomas and Levine, Sergey and Finn, Chelsea}, - year = {2025}, - month = apr, - number = {arXiv:2403.12945}, - eprint = {2403.12945}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2403.12945}, - urldate = {2025-09-08}, - abstract = {The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/XZ5Y4HZS/Khazatsky et al. - 2025 - DROID A Large-Scale In-The-Wild Robot Manipulation Dataset.pdf;/Users/fracapuano/Zotero/storage/N2Z72XLK/2403.html} -} - -@misc{kimOpenVLAOpenSourceVisionLanguageAction2024, - title = {{{OpenVLA}}: {{An Open-Source Vision-Language-Action Model}}}, - shorttitle = {{{OpenVLA}}}, - author = {Kim, Moo Jin and Pertsch, Karl and Karamcheti, Siddharth and Xiao, Ted and Balakrishna, Ashwin and Nair, Suraj and Rafailov, Rafael and Foster, Ethan and Lam, Grace and Sanketi, Pannag and Vuong, Quan and Kollar, Thomas and Burchfiel, Benjamin and Tedrake, Russ and Sadigh, Dorsa and Levine, Sergey and Liang, Percy and Finn, Chelsea}, - year = {2024}, - month = sep, - number = {arXiv:2406.09246}, - eprint = {2406.09246}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2406.09246}, - urldate = {2025-09-08}, - abstract = {Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has been challenging as 1) existing VLAs are largely closed and inaccessible to the public, and 2) prior work fails to explore methods for efficiently fine-tuning VLAs for new tasks, a key component for adoption. Addressing these challenges, we introduce OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations. OpenVLA builds on a Llama 2 language model combined with a visual encoder that fuses pretrained features from DINOv2 and SigLIP. As a product of the added data diversity and new model components, OpenVLA demonstrates strong results for generalist manipulation, outperforming closed models such as RT-2-X (55B) by 16.5\% in absolute task success rate across 29 tasks and multiple robot embodiments, with 7x fewer parameters. We further show that we can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities, and outperform expressive from-scratch imitation learning methods such as Diffusion Policy by 20.4\%. We also explore compute efficiency; as a separate contribution, we show that OpenVLA can be fine-tuned on consumer GPUs via modern low-rank adaptation methods and served efficiently via quantization without a hit to downstream success rate. Finally, we release model checkpoints, fine-tuning notebooks, and our PyTorch codebase with built-in support for training VLAs at scale on Open X-Embodiment datasets.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/XR2SX8WG/Kim et al. - 2024 - OpenVLA An Open-Source Vision-Language-Action Model.pdf;/Users/fracapuano/Zotero/storage/63Q96WRV/2406.html} -} - -@misc{kingmaAutoEncodingVariationalBayes2022, - title = {Auto-{{Encoding Variational Bayes}}}, - author = {Kingma, Diederik P. and Welling, Max}, - year = {2022}, - month = dec, - number = {arXiv:1312.6114}, - eprint = {1312.6114}, - primaryclass = {stat}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1312.6114}, - urldate = {2025-09-02}, - abstract = {How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/IT7VNQ4U/Kingma and Welling - 2022 - Auto-Encoding Variational Bayes.pdf;/Users/fracapuano/Zotero/storage/HQT22HP5/1312.html} -} - -@misc{knightStandardOpenSO100, - title = {Standard {{Open SO-100}} \& {{SO-101 Arms}}}, - author = {Knight, Rob and Kooijmans, Pepijn and Wolf, Thomas and Alibert, Simon and Aractingi, Michel and Aubakirova, Dana and Zouitine, Adil and Martino, Russi and Palma, Steven and Pascal, Caroline and Cadene, Remi} -} - -@article{koberReinforcementLearningRobotics, - title = {Reinforcement {{Learning}} in {{Robotics}}: {{A Survey}}}, - author = {Kober, Jens and Bagnell, J Andrew and Peters, Jan}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/72PRHGKL/Kober et al. - Reinforcement Learning in Robotics A Survey.pdf} -} - -@inproceedings{kong2024audioflam, - title = {Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities}, - booktitle = {International Conference on Machine Learning}, - author = {Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Ping, Wei and Valle, Rafael and Catanzaro, Bryan}, - year = {2024}, - pages = {25125--25148}, - publisher = {PMLR} -} - -@misc{kumarRMARapidMotor2021, - title = {{{RMA}}: {{Rapid Motor Adaptation}} for {{Legged Robots}}}, - shorttitle = {{{RMA}}}, - author = {Kumar, Ashish and Fu, Zipeng and Pathak, Deepak and Malik, Jitendra}, - year = {2021}, - month = jul, - number = {arXiv:2107.04034}, - eprint = {2107.04034}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2107.04034}, - urldate = {2025-08-27}, - abstract = {Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/TMYICHS6/Kumar et al. - 2021 - RMA Rapid Motor Adaptation for Legged Robots.pdf;/Users/fracapuano/Zotero/storage/TFY2EU8I/2107.html} -} - -@misc{laiActionChunkingConditional2025, - title = {Action Chunking as Conditional Policy Compression}, - author = {Lai, Lucy and Huang, Ann and Gershman, Samuel}, - year = {2025}, - month = jun, - publisher = {OSF}, - doi = {10.31234/osf.io/z8yrv_v2}, - urldate = {2025-09-02}, - abstract = {Many skills in our everyday lives are learned by sequencing actions towards a desired goal. The action sequence can become a ``chunk'' when individual actions are grouped together and executed as one unit, making them more efficient to store and execute. While chunking has been studied extensively across various domains, a puzzle remains as to why and under what conditions action chunking occurs. To tackle these questions, we develop a model of conditional policy compression---the reduction in cognitive cost by conditioning on an additional source of information---to explain the origin of chunking. We argue that chunking is a result of optimizing the trade-off between reward and conditional policy complexity. Chunking compresses policies when there is temporal structure in the environment that can be leveraged for action selection, reducing the amount of memory necessary to encode the policy. We experimentally confirm our model's predictions, showing that chunking reduces conditional policy complexity and reaction times. Chunking also increases with working memory load, consistent with the hypothesis that the degree of policy compression scales with the scarcity of cognitive resources. Finally, chunking also reduces overall working memory load, freeing cognitive resources for the benefit of other, not-chunked information.}, - archiveprefix = {OSF}, - langid = {american}, - keywords = {action selection,chunking,habits,reinforcement learning,resource-rationality,working memory} -} - -@article{laiActionChunkingConditional2025a, - title = {Action Chunking as Conditional Policy Compression}, - author = {Lai, Lucy and Huang, Ann Z. X. and Gershman, Samuel J.}, - year = {2025}, - month = nov, - journal = {Cognition}, - volume = {264}, - pages = {106201}, - issn = {1873-7838}, - doi = {10.1016/j.cognition.2025.106201}, - abstract = {Many skills in our everyday lives are learned by sequencing actions towards a desired goal. The action sequence can become a "chunk" when individual actions are grouped together and executed as one unit, making them more efficient to store and execute. While chunking has been studied extensively across various domains, a puzzle remains as to why and under what conditions action chunking occurs. To tackle these questions, we develop a model of conditional policy compression-the reduction in cognitive cost by conditioning on an additional source of information-to explain the origin of chunking. We argue that chunking is a result of optimizing the trade-off between reward and conditional policy complexity. Chunking compresses policies when there is temporal structure in the environment that can be leveraged for action selection, reducing the amount of memory necessary to encode the policy. We experimentally confirm our model's predictions, showing that chunking reduces conditional policy complexity and reaction times. Chunking also increases with working memory load, consistent with the hypothesis that the degree of policy compression scales with the scarcity of cognitive resources. Finally, chunking also reduces overall working memory load, freeing cognitive resources for the benefit of other, not-chunked information.}, - langid = {english}, - pmid = {40602234}, - keywords = {Action selection,Adult,Chunking,Cognition,Decision making,Female,Humans,Information bottleneck,Male,Memory Short-Term,Models Psychological,Psychomotor Performance,Reaction Time,Reinforcement learning,Resource rationality,Reward,Young Adult} -} - -@article{LAION-COCO, - title = {Laion Coco: 600m Synthetic Captions from Laion2b-En}, - author = {Schuhmann, C and K{\"o}pf, A and Vencu, R and Coombes, T and Beaumont, R}, - year = {2022}, - journal = {URL https://laion.ai/blog/laion-coco} -} - -@misc{laurenconWhatMattersWhen2024, - title = {What Matters When Building Vision-Language Models?}, - author = {Lauren{\c c}on, Hugo and Tronchon, L{\'e}o and Cord, Matthieu and Sanh, Victor}, - year = {2024}, - month = may, - number = {arXiv:2405.02246}, - eprint = {2405.02246}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2405.02246}, - urldate = {2025-09-09}, - abstract = {The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. To address this issue, we conduct extensive experiments around pre-trained models, architecture choice, data, and training methods. Our consolidation of findings includes the development of Idefics2, an efficient foundational VLM of 8 billion parameters. Idefics2 achieves state-of-the-art performance within its size category across various multimodal benchmarks, and is often on par with models four times its size. We release the model (base, instructed, and chat) along with the datasets created for its training.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/8H6NRPU7/Laurenรงon et al. - 2024 - What matters when building vision-language models.pdf;/Users/fracapuano/Zotero/storage/H3NETYXA/2405.html} -} - -@misc{leeBehaviorGenerationLatent2024, - title = {Behavior {{Generation}} with {{Latent Actions}}}, - author = {Lee, Seungjae and Wang, Yibin and Etukuru, Haritheja and Kim, H. Jin and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel}, - year = {2024}, - month = jun, - number = {arXiv:2403.03181}, - eprint = {2403.03181}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2403.03181}, - urldate = {2025-08-28}, - abstract = {Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models called Behavior Transformers (BeT) addresses this by discretizing actions using k-means clustering to capture different modes. However, k-means struggles to scale for high-dimensional action spaces or long sequences, and lacks gradient information, and thus BeT suffers in modeling long-range actions. In this work, we present Vector-Quantized Behavior Transformer (VQ-BeT), a versatile model for behavior generation that handles multimodal action prediction, conditional generation, and partial observations. VQ-BeT augments BeT by tokenizing continuous actions with a hierarchical vector quantization module. Across seven environments including simulated manipulation, autonomous driving, and robotics, VQ-BeT improves on state-of-the-art models such as BeT and Diffusion Policies. Importantly, we demonstrate VQ-BeT's improved ability to capture behavior modes while accelerating inference speed 5x over Diffusion Policies. Videos and code can be found https://sjlee.cc/vq-bet}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/IA93ENCH/Lee et al. - 2024 - Behavior Generation with Latent Actions.pdf;/Users/fracapuano/Zotero/storage/KBVF7GQL/2403.html} -} - -@article{leeLearningQuadrupedalLocomotion2020, - title = {Learning {{Quadrupedal Locomotion}} over {{Challenging Terrain}}}, - author = {Lee, Joonho and Hwangbo, Jemin and Wellhausen, Lorenz and Koltun, Vladlen and Hutter, Marco}, - year = {2020}, - month = oct, - journal = {Science Robotics}, - volume = {5}, - number = {47}, - eprint = {2010.11251}, - primaryclass = {cs}, - pages = {eabc5986}, - issn = {2470-9476}, - doi = {10.1126/scirobotics.abc5986}, - urldate = {2025-08-26}, - abstract = {Some of the most challenging environments on our planet are accessible to quadrupedal animals but remain out of reach for autonomous machines. Legged locomotion can dramatically expand the operational domains of robotics. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have escalated in complexity while falling short of the generality and robustness of animal locomotion. Here we present a radically robust controller for legged locomotion in challenging natural environments. We present a novel solution to incorporating proprioceptive feedback in locomotion control and demonstrate remarkable zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. It is based on a neural network that acts on a stream of proprioceptive signals. The trained controller has taken two generations of quadrupedal ANYmal robots to a variety of natural environments that are beyond the reach of prior published work in legged locomotion. The controller retains its robustness under conditions that have never been encountered during training: deformable terrain such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work opens new frontiers for robotics and indicates that radical robustness in natural environments can be achieved by training in much simpler domains.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics,Computer Science - Systems and Control,Electrical Engineering and Systems Science - Systems and Control}, - file = {/Users/fracapuano/Zotero/storage/8B9EF2CE/Lee et al. - 2020 - Learning Quadrupedal Locomotion over Challenging Terrain.pdf} -} - -@misc{lillicrapContinuousControlDeep2019, - title = {Continuous Control with Deep Reinforcement Learning}, - author = {Lillicrap, Timothy P. and Hunt, Jonathan J. and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan}, - year = {2019}, - month = jul, - number = {arXiv:1509.02971}, - eprint = {1509.02971}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1509.02971}, - urldate = {2025-08-31}, - abstract = {We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/2VN6TMVK/Lillicrap et al. - 2019 - Continuous control with deep reinforcement learning.pdf;/Users/fracapuano/Zotero/storage/4FQ4W5VE/1509.html} -} - -@misc{lillicrapContinuousControlDeep2019a, - title = {Continuous Control with Deep Reinforcement Learning}, - author = {Lillicrap, Timothy P. and Hunt, Jonathan J. and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan}, - year = {2019}, - month = jul, - number = {arXiv:1509.02971}, - eprint = {1509.02971}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1509.02971}, - urldate = {2025-08-31}, - abstract = {We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/HYMPB9F5/Lillicrap et al. - 2019 - Continuous control with deep reinforcement learning.pdf;/Users/fracapuano/Zotero/storage/EKCXMJNQ/1509.html} -} - -@misc{linVILAPretrainingVisual2024, - title = {{{VILA}}: {{On Pre-training}} for {{Visual Language Models}}}, - shorttitle = {{{VILA}}}, - author = {Lin, Ji and Yin, Hongxu and Ping, Wei and Lu, Yao and Molchanov, Pavlo and Tao, Andrew and Mao, Huizi and Kautz, Jan and Shoeybi, Mohammad and Han, Song}, - year = {2024}, - month = may, - number = {arXiv:2312.07533}, - eprint = {2312.07533}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2312.07533}, - urldate = {2025-09-09}, - abstract = {Visual language models (VLMs) rapidly progressed with the recent success of large language models. There have been growing efforts on visual instruction tuning to extend the LLM with visual inputs, but lacks an in-depth study of the visual language pre-training process, where the model learns to perform joint modeling on both modalities. In this work, we examine the design options for VLM pre-training by augmenting LLM towards VLM through step-by-step controllable comparisons. We introduce three main findings: (1) freezing LLMs during pre-training can achieve decent zero-shot performance, but lack in-context learning capability, which requires unfreezing the LLM; (2) interleaved pre-training data is beneficial whereas image-text pairs alone are not optimal; (3) re-blending text-only instruction data to image-text data during instruction fine-tuning not only remedies the degradation of text-only tasks, but also boosts VLM task accuracy. With an enhanced pre-training recipe we build VILA, a Visual Language model family that consistently outperforms the state-of-the-art models, e.g., LLaVA-1.5, across main benchmarks without bells and whistles. Multi-modal pre-training also helps unveil appealing properties of VILA, including multi-image reasoning, enhanced in-context learning, and better world knowledge.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/DNA6AFRL/Lin et al. - 2024 - VILA On Pre-training for Visual Language Models.pdf;/Users/fracapuano/Zotero/storage/K32IJ2A3/2312.html} -} - -@misc{lipmanFlowMatchingGenerative2023, - title = {Flow {{Matching}} for {{Generative Modeling}}}, - author = {Lipman, Yaron and Chen, Ricky T. Q. and {Ben-Hamu}, Heli and Nickel, Maximilian and Le, Matt}, - year = {2023}, - month = feb, - number = {arXiv:2210.02747}, - eprint = {2210.02747}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2210.02747}, - urldate = {2025-09-07}, - abstract = {We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/YFZTRGJ3/Lipman et al. - 2023 - Flow Matching for Generative Modeling.pdf;/Users/fracapuano/Zotero/storage/QUKPDHWR/2210.html} -} - -@misc{lipmanFlowMatchingGuide2024, - title = {Flow {{Matching Guide}} and {{Code}}}, - author = {Lipman, Yaron and Havasi, Marton and Holderrieth, Peter and Shaul, Neta and Le, Matt and Karrer, Brian and Chen, Ricky T. Q. and {Lopez-Paz}, David and {Ben-Hamu}, Heli and Gat, Itai}, - year = {2024}, - month = dec, - number = {arXiv:2412.06264}, - eprint = {2412.06264}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2412.06264}, - urldate = {2025-09-09}, - abstract = {Flow Matching (FM) is a recent framework for generative modeling that has achieved state-of-the-art performance across various domains, including image, video, audio, speech, and biological structures. This guide offers a comprehensive and self-contained review of FM, covering its mathematical foundations, design choices, and extensions. By also providing a PyTorch package featuring relevant examples (e.g., image and text generation), this work aims to serve as a resource for both novice and experienced researchers interested in understanding, applying and further developing FM.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/6MGQ5AZ2/Lipman et al. - 2024 - Flow Matching Guide and Code.pdf;/Users/fracapuano/Zotero/storage/IKHZ75PU/2412.html} -} - -@article{liu2024kangaroo, - title = {Kangaroo: {{A}} Powerful Video-Language Model Supporting Long-Context Video Input}, - author = {Liu, Jiajun and Wang, Yibing and Ma, Hanghang and Wu, Xiaoping and Ma, Xiaoqi and Wei, Xiaoming and Jiao, Jianbin and Wu, Enhua and Hu, Jie}, - year = {2024}, - journal = {arXiv preprint arXiv:2408.15542}, - eprint = {2408.15542}, - archiveprefix = {arXiv} -} - -@inproceedings{LLaVA-1.5, - title = {Improved Baselines with Visual Instruction Tuning}, - booktitle = {{{NeurIPS}} 2023 Workshop on Instruction Tuning and Instruction Following}, - author = {Liu, Haotian and Li, Chunyuan and Li, Yuheng and Lee, Yong Jae}, - year = {2023} -} - -@misc{luoPreciseDexterousRobotic2024, - title = {Precise and {{Dexterous Robotic Manipulation}} via {{Human-in-the-Loop Reinforcement Learning}}}, - author = {Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey}, - year = {2024}, - month = oct, - number = {arXiv:2410.21845}, - eprint = {2410.21845}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2410.21845}, - urldate = {2025-08-28}, - abstract = {Reinforcement learning (RL) holds great promise for enabling autonomous acquisition of complex robotic manipulation skills, but realizing this potential in real-world settings has been challenging. We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks, including dynamic manipulation, precision assembly, and dual-arm coordination. Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies that achieve near-perfect success rates and fast cycle times within just 1 to 2.5 hours of training. We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution. Through extensive experiments and analysis, we provide insights into the effectiveness of our approach, demonstrating how it learns robust, adaptive policies for both reactive and predictive control strategies. Our results suggest that RL can indeed learn a wide range of complex vision-based manipulation policies directly in the real world within practical training times. We hope this work will inspire a new generation of learned robotic manipulation techniques, benefiting both industrial applications and research advancements. Videos and code are available at our project website https://hil-serl.github.io/.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/LEL37N2D/Luo et al. - 2024 - Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning.pdf;/Users/fracapuano/Zotero/storage/VT83SIPT/2410.html} -} - -@misc{luoSERLSoftwareSuite2025, - title = {{{SERL}}: {{A Software Suite}} for {{Sample-Efficient Robotic Reinforcement Learning}}}, - shorttitle = {{{SERL}}}, - author = {Luo, Jianlan and Hu, Zheyuan and Xu, Charles and Tan, You Liang and Berg, Jacob and Sharma, Archit and Schaal, Stefan and Finn, Chelsea and Gupta, Abhishek and Levine, Sergey}, - year = {2025}, - month = mar, - number = {arXiv:2401.16013}, - eprint = {2401.16013}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2401.16013}, - urldate = {2025-08-31}, - abstract = {In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at https://serl-robot.github.io/}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/IFYQTF4K/Luo et al. - 2025 - SERL A Software Suite for Sample-Efficient Robotic Reinforcement Learning.pdf;/Users/fracapuano/Zotero/storage/5B67QZDM/2401.html} -} - -@book{lynchModernRoboticsMechanics2017, - title = {Modern {{Robotics}}: {{Mechanics}}, {{Planning}}, and {{Control}}}, - shorttitle = {Modern {{Robotics}}}, - author = {Lynch, Kevin M. and Park, Frank C.}, - year = {2017}, - month = may, - edition = {1}, - publisher = {Cambridge University Press}, - doi = {10.1017/9781316661239}, - urldate = {2025-08-25}, - abstract = {This introduction to robotics offers a distinct and unified perspective of the mechanics, planning and control of robots. Ideal for self-learning, or for courses, as it assumes only freshman-level physics, ordinary differential equations, linear algebra and a little bit of computing background. Modern Robotics presents the state-of-the-art, screw-theoretic techniques capturing the most salient physical features of a robot in an intuitive geometrical way. With numerous exercises at the end of each chapter, accompanying software written to reinforce the concepts in the book and video lectures aimed at changing the classroom experience, this is the go-to textbook for learning about this fascinating subject.}, - copyright = {https://www.cambridge.org/core/terms}, - isbn = {978-1-316-66123-9 978-1-107-15630-2 978-1-316-60984-2}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/S9E6NIQ8/Lynch and Park - 2017 - Modern Robotics Mechanics, Planning, and Control.pdf} -} - -@inproceedings{MAPL, - title = {{{MAPL}}: {{Parameter-efficient}} Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting}, - booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics}, - author = {Ma{\~n}as, Oscar and Rodriguez Lopez, Pau and Ahmadi, Saba and Nematzadeh, Aida and Goyal, Yash and Agrawal, Aishwarya}, - editor = {Vlachos, Andreas and Augenstein, Isabelle}, - year = {2023}, - month = may, - pages = {2523--2548}, - publisher = {Association for Computational Linguistics}, - address = {Dubrovnik, Croatia}, - doi = {10.18653/v1/2023.eacl-main.185}, - abstract = {Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation spaces of unimodal models using aligned image-text data, and can generalize to unseen VL tasks from just a few in-context examples. The small number of trainable parameters makes MAPL effective at low-data and in-domain learning. Moreover, MAPL's modularity enables easy extension to other pre-trained models. Extensive experiments on several visual question answering and image captioning benchmarks show that MAPL achieves superior or competitive performance compared to similar methods while training orders of magnitude fewer parameters. MAPL can be trained in just a few hours using modest computational resources and public datasets. We release our code and pre-trained model weights at {$<$}a href="https://github.com/oscmansan/mapl"{$>$}https://github.com/oscmansan/mapl{$<$}/a{$>$}.} -} - -@misc{marafiotiSmolVLMRedefiningSmall2025, - title = {{{SmolVLM}}: {{Redefining}} Small and Efficient Multimodal Models}, - shorttitle = {{{SmolVLM}}}, - author = {Marafioti, Andr{\'e}s and Zohar, Orr and Farr{\'e}, Miquel and Noyan, Merve and Bakouch, Elie and Cuenca, Pedro and Zakka, Cyril and Allal, Loubna Ben and Lozhkov, Anton and Tazi, Nouamane and Srivastav, Vaibhav and Lochner, Joshua and Larcher, Hugo and Morlon, Mathieu and Tunstall, Lewis and von Werra, Leandro and Wolf, Thomas}, - year = {2025}, - month = apr, - number = {arXiv:2504.05299}, - eprint = {2504.05299}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2504.05299}, - urldate = {2025-09-09}, - abstract = {Large Vision-Language Models (VLMs) deliver exceptional performance but require significant computational resources, limiting their deployment on mobile and edge devices. Smaller VLMs typically mirror design choices of larger models, such as extensive image tokenization, leading to inefficient GPU memory usage and constrained practicality for on-device applications. We introduce SmolVLM, a series of compact multimodal models specifically engineered for resource-efficient inference. We systematically explore architectural configurations, tokenization strategies, and data curation optimized for low computational overhead. Through this, we identify key design choices that yield substantial performance gains on image and video tasks with minimal memory footprints. Our smallest model, SmolVLM-256M, uses less than 1GB GPU memory during inference and outperforms the 300-times larger Idefics-80B model, despite an 18-month development gap. Our largest model, at 2.2B parameters, rivals state-of-the-art VLMs consuming twice the GPU memory. SmolVLM models extend beyond static images, demonstrating robust video comprehension capabilities. Our results emphasize that strategic architectural optimizations, aggressive yet efficient tokenization, and carefully curated training data significantly enhance multimodal performance, facilitating practical, energy-efficient deployments at significantly smaller scales.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/5P2KTYKZ/Marafioti et al. - 2025 - SmolVLM Redefining small and efficient multimodal models.pdf;/Users/fracapuano/Zotero/storage/ILVVMXNG/2504.html} -} - -@misc{margolisRapidLocomotionReinforcement2022, - title = {Rapid {{Locomotion}} via {{Reinforcement Learning}}}, - author = {Margolis, Gabriel B. and Yang, Ge and Paigwar, Kartik and Chen, Tao and Agrawal, Pulkit}, - year = {2022}, - month = may, - number = {arXiv:2205.02824}, - eprint = {2205.02824}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2205.02824}, - urldate = {2025-08-26}, - abstract = {Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work. Videos of the robot's behaviors are available at: https://agility.csail.mit.edu/}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/URXYM9ZM/Margolis et al. - 2022 - Rapid Locomotion via Reinforcement Learning.pdf;/Users/fracapuano/Zotero/storage/S7PRP8ZT/2205.html} -} - -@misc{margolisWalkTheseWays2022, - title = {Walk {{These Ways}}: {{Tuning Robot Control}} for {{Generalization}} with {{Multiplicity}} of {{Behavior}}}, - shorttitle = {Walk {{These Ways}}}, - author = {Margolis, Gabriel B. and Agrawal, Pulkit}, - year = {2022}, - month = dec, - number = {arXiv:2212.03238}, - eprint = {2212.03238}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2212.03238}, - urldate = {2025-08-27}, - abstract = {Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics,Computer Science - Systems and Control,Electrical Engineering and Systems Science - Systems and Control}, - file = {/Users/fracapuano/Zotero/storage/KPNWQYU7/Margolis and Agrawal - 2022 - Walk These Ways Tuning Robot Control for Generalization with Multiplicity of Behavior.pdf;/Users/fracapuano/Zotero/storage/EVSJWCYV/2212.html} -} - -@misc{mccormacSemanticFusionDense3D2016, - title = {{{SemanticFusion}}: {{Dense 3D Semantic Mapping}} with {{Convolutional Neural Networks}}}, - shorttitle = {{{SemanticFusion}}}, - author = {McCormac, John and Handa, Ankur and Davison, Andrew and Leutenegger, Stefan}, - year = {2016}, - month = sep, - number = {arXiv:1609.05130}, - eprint = {1609.05130}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1609.05130}, - urldate = {2025-08-28}, - abstract = {Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need extend beyond geometry and appearence - they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN's semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of approximately 25Hz.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/3ASZ9WL8/McCormac et al. - 2016 - SemanticFusion Dense 3D Semantic Mapping with Convolutional Neural Networks.pdf;/Users/fracapuano/Zotero/storage/VGUFP4FL/1609.html} -} - -@misc{minicmpv2024, - title = {{{MiniCPM-v}}: A {{GPT-4V}} Level {{MLLM}} on Your Phone}, - author = {Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and Chen, Qianyu and Zhou, Huarong and Zou, Zhensheng and Zhang, Haoye and Hu, Shengding and Zheng, Zhi and Zhou, Jie and Cai, Jie and Han, Xu and Zeng, Guoyang and Li, Dahai and Liu, Zhiyuan and Sun, Maosong}, - year = {2024}, - eprint = {2408.01800}, - primaryclass = {cs.CV}, - archiveprefix = {arXiv} -} - -@inproceedings{MMC4, - title = {Multimodal {{C4}}: {{An}} Open, Billion-Scale Corpus of Images Interleaved with Text}, - booktitle = {Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, - author = {Zhu, Wanrong and Hessel, Jack and Awadalla, Anas and Gadre, Samir Yitzhak and Dodge, Jesse and Fang, Alex and Yu, Youngjae and Schmidt, Ludwig and Wang, William Yang and Choi, Yejin}, - year = {2023} -} - -@misc{mnihPlayingAtariDeep2013, - title = {Playing {{Atari}} with {{Deep Reinforcement Learning}}}, - author = {Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Graves, Alex and Antonoglou, Ioannis and Wierstra, Daan and Riedmiller, Martin}, - year = {2013}, - month = dec, - number = {arXiv:1312.5602}, - eprint = {1312.5602}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1312.5602}, - urldate = {2025-08-31}, - abstract = {We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/WVHMEBJ5/Mnih et al. - 2013 - Playing Atari with Deep Reinforcement Learning.pdf;/Users/fracapuano/Zotero/storage/MQIFGTV7/1312.html} -} - -@misc{moondream, - title = {Moondream}, - author = {Korrapati, Vik}, - year = {2024}, - howpublished = {Online} -} - -@article{mooreRobotsNuclearPower, - title = {Robots for Nuclear Power Plants}, - author = {Moore, Taylor}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/IMLZMTF3/Moore - Robots for nuclear power plants.pdf} -} - -@misc{nakkiranStepbyStepDiffusionElementary2024, - title = {Step-by-{{Step Diffusion}}: {{An Elementary Tutorial}}}, - shorttitle = {Step-by-{{Step Diffusion}}}, - author = {Nakkiran, Preetum and Bradley, Arwen and Zhou, Hattie and Advani, Madhu}, - year = {2024}, - month = jun, - number = {arXiv:2406.08929}, - eprint = {2406.08929}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2406.08929}, - urldate = {2025-09-04}, - abstract = {We present an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/F8X6FZUI/Nakkiran et al. - 2024 - Step-by-Step Diffusion An Elementary Tutorial.pdf;/Users/fracapuano/Zotero/storage/CR78HTMU/2406.html} -} - -@inproceedings{OBELICS, - title = {{{OBELICS}}: {{An}} Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents}, - booktitle = {Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, - author = {Lauren{\c c}on, Hugo and Saulnier, Lucile and Tronchon, Leo and Bekman, Stas and Singh, Amanpreet and Lozhkov, Anton and Wang, Thomas and Karamcheti, Siddharth and Rush, Alexander M and Kiela, Douwe and Cord, Matthieu and Sanh, Victor}, - year = {2023} -} - -@misc{openaiGPT4TechnicalReport2024, - title = {{{GPT-4 Technical Report}}}, - author = {OpenAI and Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and Avila, Red and Babuschkin, Igor and Balaji, Suchir and Balcom, Valerie and Baltescu, Paul and Bao, Haiming and Bavarian, Mohammad and Belgum, Jeff and Bello, Irwan and Berdine, Jake and {Bernadett-Shapiro}, Gabriel and Berner, Christopher and Bogdonoff, Lenny and Boiko, Oleg and Boyd, Madelaine and Brakman, Anna-Luisa and Brockman, Greg and Brooks, Tim and Brundage, Miles and Button, Kevin and Cai, Trevor and Campbell, Rosie and Cann, Andrew and Carey, Brittany and Carlson, Chelsea and Carmichael, Rory and Chan, Brooke and Chang, Che and Chantzis, Fotis and Chen, Derek and Chen, Sully and Chen, Ruby and Chen, Jason and Chen, Mark and Chess, Ben and Cho, Chester and Chu, Casey and Chung, Hyung Won and Cummings, Dave and Currier, Jeremiah and Dai, Yunxing and Decareaux, Cory and Degry, Thomas and Deutsch, Noah and Deville, Damien and Dhar, Arka and Dohan, David and Dowling, Steve and Dunning, Sheila and Ecoffet, Adrien and Eleti, Atty and Eloundou, Tyna and Farhi, David and Fedus, Liam and Felix, Niko and Fishman, Sim{\'o}n Posada and Forte, Juston and Fulford, Isabella and Gao, Leo and Georges, Elie and Gibson, Christian and Goel, Vik and Gogineni, Tarun and Goh, Gabriel and {Gontijo-Lopes}, Rapha and Gordon, Jonathan and Grafstein, Morgan and Gray, Scott and Greene, Ryan and Gross, Joshua and Gu, Shixiang Shane and Guo, Yufei and Hallacy, Chris and Han, Jesse and Harris, Jeff and He, Yuchen and Heaton, Mike and Heidecke, Johannes and Hesse, Chris and Hickey, Alan and Hickey, Wade and Hoeschele, Peter and Houghton, Brandon and Hsu, Kenny and Hu, Shengli and Hu, Xin and Huizinga, Joost and Jain, Shantanu and Jain, Shawn and Jang, Joanne and Jiang, Angela and Jiang, Roger and Jin, Haozhun and Jin, Denny and Jomoto, Shino and Jonn, Billie and Jun, Heewoo and Kaftan, Tomer and Kaiser, {\L}ukasz and Kamali, Ali and Kanitscheider, Ingmar and Keskar, Nitish Shirish and Khan, Tabarak and Kilpatrick, Logan and Kim, Jong Wook and Kim, Christina and Kim, Yongjik and Kirchner, Jan Hendrik and Kiros, Jamie and Knight, Matt and Kokotajlo, Daniel and Kondraciuk, {\L}ukasz and Kondrich, Andrew and Konstantinidis, Aris and Kosic, Kyle and Krueger, Gretchen and Kuo, Vishal and Lampe, Michael and Lan, Ikai and Lee, Teddy and Leike, Jan and Leung, Jade and Levy, Daniel and Li, Chak Ming and Lim, Rachel and Lin, Molly and Lin, Stephanie and Litwin, Mateusz and Lopez, Theresa and Lowe, Ryan and Lue, Patricia and Makanju, Anna and Malfacini, Kim and Manning, Sam and Markov, Todor and Markovski, Yaniv and Martin, Bianca and Mayer, Katie and Mayne, Andrew and McGrew, Bob and McKinney, Scott Mayer and McLeavey, Christine and McMillan, Paul and McNeil, Jake and Medina, David and Mehta, Aalok and Menick, Jacob and Metz, Luke and Mishchenko, Andrey and Mishkin, Pamela and Monaco, Vinnie and Morikawa, Evan and Mossing, Daniel and Mu, Tong and Murati, Mira and Murk, Oleg and M{\'e}ly, David and Nair, Ashvin and Nakano, Reiichiro and Nayak, Rajeev and Neelakantan, Arvind and Ngo, Richard and Noh, Hyeonwoo and Ouyang, Long and O'Keefe, Cullen and Pachocki, Jakub and Paino, Alex and Palermo, Joe and Pantuliano, Ashley and Parascandolo, Giambattista and Parish, Joel and Parparita, Emy and Passos, Alex and Pavlov, Mikhail and Peng, Andrew and Perelman, Adam and Peres, Filipe de Avila Belbute and Petrov, Michael and Pinto, Henrique Ponde de Oliveira and Michael and Pokorny and Pokrass, Michelle and Pong, Vitchyr H. and Powell, Tolly and Power, Alethea and Power, Boris and Proehl, Elizabeth and Puri, Raul and Radford, Alec and Rae, Jack and Ramesh, Aditya and Raymond, Cameron and Real, Francis and Rimbach, Kendra and Ross, Carl and Rotsted, Bob and Roussez, Henri and Ryder, Nick and Saltarelli, Mario and Sanders, Ted and Santurkar, Shibani and Sastry, Girish and Schmidt, Heather and Schnurr, David and Schulman, John and Selsam, Daniel and Sheppard, Kyla and Sherbakov, Toki and Shieh, Jessica and Shoker, Sarah and Shyam, Pranav and Sidor, Szymon and Sigler, Eric and Simens, Maddie and Sitkin, Jordan and Slama, Katarina and Sohl, Ian and Sokolowsky, Benjamin and Song, Yang and Staudacher, Natalie and Such, Felipe Petroski and Summers, Natalie and Sutskever, Ilya and Tang, Jie and Tezak, Nikolas and Thompson, Madeleine B. and Tillet, Phil and Tootoonchian, Amin and Tseng, Elizabeth and Tuggle, Preston and Turley, Nick and Tworek, Jerry and Uribe, Juan Felipe Cer{\'o}n and Vallone, Andrea and Vijayvergiya, Arun and Voss, Chelsea and Wainwright, Carroll and Wang, Justin Jay and Wang, Alvin and Wang, Ben and Ward, Jonathan and Wei, Jason and Weinmann, C. J. and Welihinda, Akila and Welinder, Peter and Weng, Jiayi and Weng, Lilian and Wiethoff, Matt and Willner, Dave and Winter, Clemens and Wolrich, Samuel and Wong, Hannah and Workman, Lauren and Wu, Sherwin and Wu, Jeff and Wu, Michael and Xiao, Kai and Xu, Tao and Yoo, Sarah and Yu, Kevin and Yuan, Qiming and Zaremba, Wojciech and Zellers, Rowan and Zhang, Chong and Zhang, Marvin and Zhao, Shengjia and Zheng, Tianhao and Zhuang, Juntang and Zhuk, William and Zoph, Barret}, - year = {2024}, - month = mar, - number = {arXiv:2303.08774}, - eprint = {2303.08774}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2303.08774}, - urldate = {2025-08-27}, - abstract = {We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10\% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language}, - file = {/Users/fracapuano/Zotero/storage/9CJAC5WC/OpenAI et al. - 2024 - GPT-4 Technical Report.pdf;/Users/fracapuano/Zotero/storage/8VS6FA7G/2303.html} -} - -@misc{OpenXEmbodimentRobotic, - title = {Open {{X-Embodiment}}: {{Robotic Learning Datasets}} and {{RT-X Models}}}, - shorttitle = {Open {{X-Embodiment}}}, - urldate = {2025-08-27}, - abstract = {Project page for Open X-Embodiment: Robotic Learning Datasets and RT-X Models.}, - howpublished = {https://robotics-transformer-x.github.io/}, - file = {/Users/fracapuano/Zotero/storage/5DS9SYCH/robotics-transformer-x.github.io.html} -} - -@misc{oquabDINOv2LearningRobust2024, - title = {{{DINOv2}}: {{Learning Robust Visual Features}} without {{Supervision}}}, - shorttitle = {{{DINOv2}}}, - author = {Oquab, Maxime and Darcet, Timoth{\'e}e and Moutakanni, Th{\'e}o and Vo, Huy and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and {El-Nouby}, Alaaeldin and Assran, Mahmoud and Ballas, Nicolas and Galuba, Wojciech and Howes, Russell and Huang, Po-Yao and Li, Shang-Wen and Misra, Ishan and Rabbat, Michael and Sharma, Vasu and Synnaeve, Gabriel and Xu, Hu and Jegou, Herv{\'e} and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr}, - year = {2024}, - month = feb, - number = {arXiv:2304.07193}, - eprint = {2304.07193}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2304.07193}, - urldate = {2025-09-07}, - abstract = {The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/QUP9C62G/Oquab et al. - 2024 - DINOv2 Learning Robust Visual Features without Supervision.pdf;/Users/fracapuano/Zotero/storage/G5P2WXLM/2304.html} -} - -@misc{permenterInterpretingImprovingDiffusion2024, - title = {Interpreting and {{Improving Diffusion Models}} from an {{Optimization Perspective}}}, - author = {Permenter, Frank and Yuan, Chenyang}, - year = {2024}, - month = jun, - number = {arXiv:2306.04848}, - eprint = {2306.04848}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2306.04848}, - urldate = {2025-09-03}, - abstract = {Denoising is intuitively related to projection. Indeed, under the manifold hypothesis, adding random noise is approximately equivalent to orthogonal perturbation. Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to interpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. We then provide straight-forward convergence analysis of the DDIM sampler under simple assumptions on the projection error of the denoiser. Finally, we propose a new gradient-estimation sampler, generalizing DDIM using insights from our theoretical results. In as few as 5-10 function evaluations, our sampler achieves state-of-the-art FID scores on pretrained CIFAR-10 and CelebA models and can generate high quality samples on latent diffusion models.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Mathematics - Optimization and Control,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/45F7R93S/Permenter and Yuan - 2024 - Interpreting and Improving Diffusion Models from an Optimization Perspective.pdf;/Users/fracapuano/Zotero/storage/9EAM4RZH/2306.html} -} - -@misc{pieterabbeelL5DDPGSAC2021, - title = {L5 {{DDPG}} and {{SAC}} ({{Foundations}} of {{Deep RL Series}})}, - author = {{Pieter Abbeel}}, - year = {2021}, - month = aug, - urldate = {2025-09-01}, - abstract = {Lecture 5 of a 6-lecture series on the Foundations of Deep RL Topic: Deep Deterministic Policy Gradients (DDPG) and Soft Actor Critic (SAC) Instructor: Pieter Abbeel} -} - -@inproceedings{pmlr-v32-silver14, - title = {Deterministic Policy Gradient Algorithms}, - booktitle = {Proceedings of the 31st International Conference on Machine Learning}, - author = {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin}, - editor = {Xing, Eric P. and Jebara, Tony}, - year = {2014}, - month = jun, - series = {Proceedings of Machine Learning Research}, - volume = {32}, - pages = {387--395}, - publisher = {PMLR}, - address = {Bejing, China}, - abstract = {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.} -} - -@misc{PolicyGradientMethods, - title = {Policy Gradient Methods for Reinforcement Learning with Function Approximation - {{Google Search}}}, - urldate = {2025-08-31}, - howpublished = {https://www.google.com/search?q=Policy+gradient+methods+for+reinforcement+learning+with+function+approximation\&sourceid=chrome\&ie=UTF-8}, - file = {/Users/fracapuano/Zotero/storage/GRIBG9H8/search.html} -} - -@misc{polyakMovieGenCast2025, - title = {Movie {{Gen}}: {{A Cast}} of {{Media Foundation Models}}}, - shorttitle = {Movie {{Gen}}}, - author = {Polyak, Adam and Zohar, Amit and Brown, Andrew and Tjandra, Andros and Sinha, Animesh and Lee, Ann and Vyas, Apoorv and Shi, Bowen and Ma, Chih-Yao and Chuang, Ching-Yao and Yan, David and Choudhary, Dhruv and Wang, Dingkang and Sethi, Geet and Pang, Guan and Ma, Haoyu and Misra, Ishan and Hou, Ji and Wang, Jialiang and Jagadeesh, Kiran and Li, Kunpeng and Zhang, Luxin and Singh, Mannat and Williamson, Mary and Le, Matt and Yu, Matthew and Singh, Mitesh Kumar and Zhang, Peizhao and Vajda, Peter and Duval, Quentin and Girdhar, Rohit and Sumbaly, Roshan and Rambhatla, Sai Saketh and Tsai, Sam and Azadi, Samaneh and Datta, Samyak and Chen, Sanyuan and Bell, Sean and Ramaswamy, Sharadh and Sheynin, Shelly and Bhattacharya, Siddharth and Motwani, Simran and Xu, Tao and Li, Tianhe and Hou, Tingbo and Hsu, Wei-Ning and Yin, Xi and Dai, Xiaoliang and Taigman, Yaniv and Luo, Yaqiao and Liu, Yen-Cheng and Wu, Yi-Chiao and Zhao, Yue and Kirstain, Yuval and He, Zecheng and He, Zijian and Pumarola, Albert and Thabet, Ali and Sanakoyeu, Artsiom and Mallya, Arun and Guo, Baishan and Araya, Boris and Kerr, Breena and Wood, Carleigh and Liu, Ce and Peng, Cen and Vengertsev, Dimitry and Schonfeld, Edgar and Blanchard, Elliot and {Juefei-Xu}, Felix and Nord, Fraylie and Liang, Jeff and Hoffman, John and Kohler, Jonas and Fire, Kaolin and Sivakumar, Karthik and Chen, Lawrence and Yu, Licheng and Gao, Luya and Georgopoulos, Markos and Moritz, Rashel and Sampson, Sara K. and Li, Shikai and Parmeggiani, Simone and Fine, Steve and Fowler, Tara and Petrovic, Vladan and Du, Yuming}, - year = {2025}, - month = feb, - number = {arXiv:2410.13720}, - eprint = {2410.13720}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2410.13720}, - urldate = {2025-09-06}, - abstract = {We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation. Our largest video generation model is a 30B parameter transformer trained with a maximum context length of 73K video tokens, corresponding to a generated video of 16 seconds at 16 frames-per-second. We show multiple technical innovations and simplifications on the architecture, latent spaces, training objectives and recipes, data curation, evaluation protocols, parallelization techniques, and inference optimizations that allow us to reap the benefits of scaling pre-training data, model size, and training compute for training large scale media generation models. We hope this paper helps the research community to accelerate progress and innovation in media generation models. All videos from this paper are available at https://go.fb.me/MovieGenResearchVideos.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Electrical Engineering and Systems Science - Image and Video Processing}, - file = {/Users/fracapuano/Zotero/storage/KGDELBPH/Polyak et al. - 2025 - Movie Gen A Cast of Media Foundation Models.pdf;/Users/fracapuano/Zotero/storage/LV8WPFVU/2410.html} -} - -@inproceedings{pomerleauALVINNAutonomousLand1988, - title = {{{ALVINN}}: {{An Autonomous Land Vehicle}} in a {{Neural Network}}}, - shorttitle = {{{ALVINN}}}, - booktitle = {Advances in {{Neural Information Processing Systems}}}, - author = {Pomerleau, Dean A.}, - year = {1988}, - volume = {1}, - publisher = {Morgan-Kaufmann}, - urldate = {2025-09-03}, - abstract = {ALVINN (Autonomous Land Vehicle In a Neural Network) is a 3-layer back-propagation network designed for the task of road following. Cur(cid:173) rently ALVINN takes images from a camera and a laser range finder as input and produces as output the direction the vehicle should travel in order to follow the road. Training has been conducted using simulated road images. Successful tests on the Carnegie Mellon autonomous navigation test vehicle indicate that the network can effectively follow real roads under certain field conditions. The representation developed to perfOIm the task differs dra(cid:173) matically when the networlc is trained under various conditions, suggesting the possibility of a novel adaptive autonomous navigation system capable of tailoring its processing to the conditions at hand.}, - file = {/Users/fracapuano/Zotero/storage/BT7UE8MA/Pomerleau - 1988 - ALVINN An Autonomous Land Vehicle in a Neural Network.pdf} -} - -@inproceedings{pomerleauALVINNAutonomousLand1988a, - title = {{{ALVINN}}: {{An Autonomous Land Vehicle}} in a {{Neural Network}}}, - shorttitle = {{{ALVINN}}}, - booktitle = {Advances in {{Neural Information Processing Systems}}}, - author = {Pomerleau, Dean A.}, - year = {1988}, - volume = {1}, - publisher = {Morgan-Kaufmann}, - urldate = {2025-09-01}, - abstract = {ALVINN (Autonomous Land Vehicle In a Neural Network) is a 3-layer back-propagation network designed for the task of road following. Cur(cid:173) rently ALVINN takes images from a camera and a laser range finder as input and produces as output the direction the vehicle should travel in order to follow the road. Training has been conducted using simulated road images. Successful tests on the Carnegie Mellon autonomous navigation test vehicle indicate that the network can effectively follow real roads under certain field conditions. The representation developed to perfOIm the task differs dra(cid:173) matically when the networlc is trained under various conditions, suggesting the possibility of a novel adaptive autonomous navigation system capable of tailoring its processing to the conditions at hand.}, - file = {/Users/fracapuano/Zotero/storage/P64K7XYH/Pomerleau - 1988 - ALVINN An Autonomous Land Vehicle in a Neural Network.pdf} -} - -@book{prince2023understanding, - title = {Understanding Deep Learning}, - author = {Prince, Simon J.D.}, - year = {2023}, - publisher = {The MIT Press} -} - -@misc{radfordLearningTransferableVisual2021, - title = {Learning {{Transferable Visual Models From Natural Language Supervision}}}, - author = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and Krueger, Gretchen and Sutskever, Ilya}, - year = {2021}, - month = feb, - number = {arXiv:2103.00020}, - eprint = {2103.00020}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2103.00020}, - urldate = {2025-09-09}, - abstract = {State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/9RAM5ZIE/Radford et al. - 2021 - Learning Transferable Visual Models From Natural Language Supervision.pdf;/Users/fracapuano/Zotero/storage/YIEJ6PCB/2103.html} -} - -@misc{raffelExploringLimitsTransfer2023, - title = {Exploring the {{Limits}} of {{Transfer Learning}} with a {{Unified Text-to-Text Transformer}}}, - author = {Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J.}, - year = {2023}, - month = sep, - number = {arXiv:1910.10683}, - eprint = {1910.10683}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1910.10683}, - urldate = {2025-09-07}, - abstract = {Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/F7VN7TZA/Raffel et al. - 2023 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.pdf;/Users/fracapuano/Zotero/storage/YALEE6N9/1910.html} -} - -@misc{reedGeneralistAgent2022, - title = {A {{Generalist Agent}}}, - author = {Reed, Scott and Zolna, Konrad and Parisotto, Emilio and Colmenarejo, Sergio Gomez and Novikov, Alexander and {Barth-Maron}, Gabriel and Gimenez, Mai and Sulsky, Yury and Kay, Jackie and Springenberg, Jost Tobias and Eccles, Tom and Bruce, Jake and Razavi, Ali and Edwards, Ashley and Heess, Nicolas and Chen, Yutian and Hadsell, Raia and Vinyals, Oriol and Bordbar, Mahyar and de Freitas, Nando}, - year = {2022}, - month = nov, - number = {arXiv:2205.06175}, - eprint = {2205.06175}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2205.06175}, - urldate = {2025-09-07}, - abstract = {Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/VDNMGQB4/Reed et al. - 2022 - A Generalist Agent.pdf;/Users/fracapuano/Zotero/storage/9Y4ZMZIL/2205.html} -} - -@misc{ronnebergerUNetConvolutionalNetworks2015, - title = {U-{{Net}}: {{Convolutional Networks}} for {{Biomedical Image Segmentation}}}, - shorttitle = {U-{{Net}}}, - author = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas}, - year = {2015}, - month = may, - number = {arXiv:1505.04597}, - eprint = {1505.04597}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1505.04597}, - urldate = {2025-09-06}, - abstract = {There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/7H54LXUZ/Ronneberger et al. - 2015 - U-Net Convolutional Networks for Biomedical Image Segmentation.pdf;/Users/fracapuano/Zotero/storage/4NZ6ZRGI/1505.html} -} - -@misc{rossReductionImitationLearning2011, - title = {A {{Reduction}} of {{Imitation Learning}} and {{Structured Prediction}} to {{No-Regret Online Learning}}}, - author = {Ross, Stephane and Gordon, Geoffrey J. and Bagnell, J. Andrew}, - year = {2011}, - month = mar, - number = {arXiv:1011.0686}, - eprint = {1011.0686}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1011.0686}, - urldate = {2025-09-02}, - abstract = {Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/PFDE9IUH/Ross et al. - 2011 - A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.pdf;/Users/fracapuano/Zotero/storage/7VA6XGEA/1011.html} -} - -@misc{sannemanStateIndustrialRobotics2020, - title = {The {{State}} of {{Industrial Robotics}}: {{Emerging Technologies}}, {{Challenges}}, and {{Key Research Directions}}}, - shorttitle = {The {{State}} of {{Industrial Robotics}}}, - author = {Sanneman, Lindsay and Fourie, Christopher and Shah, Julie A.}, - year = {2020}, - month = oct, - number = {arXiv:2010.14537}, - eprint = {2010.14537}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2010.14537}, - urldate = {2025-08-26}, - abstract = {Robotics and related technologies are central to the ongoing digitization and advancement of manufacturing. In recent years, a variety of strategic initiatives around the world including "Industry 4.0", introduced in Germany in 2011 have aimed to improve and connect manufacturing technologies in order to optimize production processes. In this work, we study the changing technological landscape of robotics and "internet-of-things" (IoT)-based connective technologies over the last 7-10 years in the wake of Industry 4.0. We interviewed key players within the European robotics ecosystem, including robotics manufacturers and integrators, original equipment manufacturers (OEMs), and applied industrial research institutions and synthesize our findings in this paper. We first detail the state-of-the-art robotics and IoT technologies we observed and that the companies discussed during our interviews. We then describe the processes the companies follow when deciding whether and how to integrate new technologies, the challenges they face when integrating these technologies, and some immediate future technological avenues they are exploring in robotics and IoT. Finally, based on our findings, we highlight key research directions for the robotics community that can enable improved capabilities in the context of manufacturing.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/8ETI44WZ/Sanneman et al. - 2020 - The State of Industrial Robotics Emerging Technologies, Challenges, and Key Research Directions.pdf;/Users/fracapuano/Zotero/storage/Y37S4WE2/2010.html} -} - -@misc{ScholargoogleusercontentcomScholarbibqinfo88G_QluoYI4J, - title = {Scholar.Googleusercontent.Com/Scholar.Bib?Q=info:{{88G}}\_{{QluoYI4J}}:Scholar.Google.Com/\&output=citation\&scisdr={{CgIQg4SNEO7moXYtjoc}}:{{AAZF9b8AAAAAaLQrlocZcsFJirMs3WpUvW3zxvM}}\&scisig={{AAZF9b8AAAAAaLQrlgE-ix1Lq0FaNEP0Mj37mGU}}\&scisf=4\&ct=citation\&cd=-1\&hl=en}, - urldate = {2025-08-31}, - howpublished = {https://scholar.googleusercontent.com/scholar.bib?q=info:88G\_QluoYI4J:scholar.google.com/\&output=citation\&scisdr=CgIQg4SNEO7moXYtjoc:AAZF9b8AAAAAaLQrlocZcsFJirMs3WpUvW3zxvM\&scisig=AAZF9b8AAAAAaLQrlgE-ix1Lq0FaNEP0Mj37mGU\&scisf=4\&ct=citation\&cd=-1\&hl=en}, - file = {/Users/fracapuano/Zotero/storage/9DKD7T9B/scholar.html} -} - -@misc{schulmanProximalPolicyOptimization2017, - title = {Proximal {{Policy Optimization Algorithms}}}, - author = {Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg}, - year = {2017}, - month = aug, - number = {arXiv:1707.06347}, - eprint = {1707.06347}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1707.06347}, - urldate = {2025-08-29}, - abstract = {We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/DGQ79LDQ/Schulman et al. - 2017 - Proximal Policy Optimization Algorithms.pdf;/Users/fracapuano/Zotero/storage/ISS4QTB9/1707.html} -} - -@misc{schulmanTrustRegionPolicy2017, - title = {Trust {{Region Policy Optimization}}}, - author = {Schulman, John and Levine, Sergey and Moritz, Philipp and Jordan, Michael I. and Abbeel, Pieter}, - year = {2017}, - month = apr, - number = {arXiv:1502.05477}, - eprint = {1502.05477}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1502.05477}, - urldate = {2025-08-29}, - abstract = {We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/MC469UHX/Schulman et al. - 2017 - Trust Region Policy Optimization.pdf;/Users/fracapuano/Zotero/storage/V7M6LZV3/1502.html} -} - -@book{shalev-shwartzUnderstandingMachineLearning2014, - title = {Understanding {{Machine Learning}}: {{From Theory}} to {{Algorithms}}}, - shorttitle = {Understanding {{Machine Learning}}}, - author = {{Shalev-Shwartz}, Shai and {Ben-David}, Shai}, - year = {2014}, - month = may, - edition = {1}, - publisher = {Cambridge University Press}, - doi = {10.1017/CBO9781107298019}, - urldate = {2025-09-01}, - abstract = {Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides a theoretical account of the fundamentals underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics, the book covers a wide array of central topics unaddressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for advanced undergraduates or beginning graduates, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics and engineering.}, - copyright = {https://www.cambridge.org/core/terms}, - isbn = {978-1-107-05713-5 978-1-107-29801-9}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/KTKPACDG/Shalev-Shwartz and Ben-David - 2014 - Understanding Machine Learning From Theory to Algorithms.pdf} -} - -@article{shazeerOUTRAGEOUSLYLARGENEURAL2017, - title = {{{OUTRAGEOUSLY LARGE NEURAL NETWORKS}}: {{THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER}}}, - author = {Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc and Dean, Jeff}, - year = {2017}, - abstract = {The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/QHJRU8HX/Shazeer et al. - 2017 - OUTRAGEOUSLY LARGE NEURAL NETWORKS THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER.pdf} -} - -@misc{shazeerOutrageouslyLargeNeural2017a, - title = {Outrageously {{Large Neural Networks}}: {{The Sparsely-Gated Mixture-of-Experts Layer}}}, - shorttitle = {Outrageously {{Large Neural Networks}}}, - author = {Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc and Hinton, Geoffrey and Dean, Jeff}, - year = {2017}, - month = jan, - number = {arXiv:1701.06538}, - eprint = {1701.06538}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1701.06538}, - urldate = {2025-09-08}, - abstract = {The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computing,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/DJX78PLY/Shazeer et al. - 2017 - Outrageously Large Neural Networks The Sparsely-Gated Mixture-of-Experts Layer.pdf;/Users/fracapuano/Zotero/storage/I4T8DUPG/1701.html} -} - -@inproceedings{shukor2023epalm, - title = {Ep-Alm: {{Efficient}} Perceptual Augmentation of Language Models}, - booktitle = {Proceedings of the {{IEEE}}/{{CVF}} International Conference on Computer Vision}, - author = {Shukor, Mustafa and Dancette, Corentin and Cord, Matthieu}, - year = {2023}, - pages = {22056--22069} -} - -@misc{shukorSmolVLAVisionLanguageActionModel2025, - title = {{{SmolVLA}}: {{A Vision-Language-Action Model}} for {{Affordable}} and {{Efficient Robotics}}}, - shorttitle = {{{SmolVLA}}}, - author = {Shukor, Mustafa and Aubakirova, Dana and Capuano, Francesco and Kooijmans, Pepijn and Palma, Steven and Zouitine, Adil and Aractingi, Michel and Pascal, Caroline and Russi, Martino and Marafioti, Andres and Alibert, Simon and Cord, Matthieu and Wolf, Thomas and Cadene, Remi}, - year = {2025}, - month = jun, - number = {arXiv:2506.01844}, - eprint = {2506.01844}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2506.01844}, - urldate = {2025-08-28}, - abstract = {Vision-language models (VLMs) pretrained on large-scale multimodal datasets encode rich visual and linguistic knowledge, making them a strong foundation for robotics. Rather than training robotic policies from scratch, recent approaches adapt VLMs into vision-language-action (VLA) models that enable natural language-driven perception and control. However, existing VLAs are typically massive--often with billions of parameters--leading to high training costs and limited real-world deployability. Moreover, they rely on academic and industrial datasets, overlooking the growing availability of community-collected data from affordable robotic platforms. In this work, we present SmolVLA, a small, efficient, and community-driven VLA that drastically reduces both training and inference costs, while retaining competitive performance. SmolVLA is designed to be trained on a single GPU and deployed on consumer-grade GPUs or even CPUs. To further improve responsiveness, we introduce an asynchronous inference stack decoupling perception and action prediction from action execution, allowing higher control rates with chunked action generation. Despite its compact size, SmolVLA achieves performance comparable to VLAs that are 10x larger. We evaluate SmolVLA on a range of both simulated as well as real-world robotic benchmarks and release all code, pretrained models, and training data.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/Y64M6XLX/Shukor et al. - 2025 - SmolVLA A Vision-Language-Action Model for Affordable and Efficient Robotics.pdf;/Users/fracapuano/Zotero/storage/FNNQTK8Q/2506.html} -} - -@book{sicilianoSpringerHandbookRobotics2016, - title = {Springer {{Handbook}} of {{Robotics}}}, - editor = {Siciliano, Bruno and Khatib, Oussama}, - year = {2016}, - series = {Springer {{Handbooks}}}, - publisher = {Springer International Publishing}, - address = {Cham}, - doi = {10.1007/978-3-319-32552-1}, - urldate = {2025-08-26}, - copyright = {https://www.springer.com/tdm}, - isbn = {978-3-319-32550-7 978-3-319-32552-1}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/JHG94GYG/Siciliano and Khatib - 2016 - Springer Handbook of Robotics.pdf} -} - -@misc{SignYourAccount, - title = {Sign in to Your Account}, - urldate = {2025-09-02}, - howpublished = {https://login.microsoftonline.com/cc95de1b-97f5-4f93-b4ba-fe68b852cf91/login}, - file = {/Users/fracapuano/Zotero/storage/AP6JNKS8/login.html} -} - -@article{silverDeterministicPolicyGradient, - title = {Deterministic {{Policy Gradient Algorithms}}}, - author = {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin}, - abstract = {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/IMFSXA3G/Silver et al. - Deterministic Policy Gradient Algorithms.pdf} -} - -@inproceedings{silverDeterministicPolicyGradient2014, - title = {Deterministic {{Policy Gradient Algorithms}}}, - booktitle = {Proceedings of the 31st {{International Conference}} on {{Machine Learning}}}, - author = {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin}, - year = {2014}, - month = jan, - pages = {387--395}, - publisher = {PMLR}, - issn = {1938-7228}, - urldate = {2025-08-31}, - abstract = {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/YI9JNYPV/Silver et al. - 2014 - Deterministic Policy Gradient Algorithms.pdf} -} - -@article{silverDeterministicPolicyGradienta, - title = {Deterministic {{Policy Gradient Algorithms}}}, - author = {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin}, - abstract = {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/VWQNLK9R/Silver et al. - Deterministic Policy Gradient Algorithms.pdf} -} - -@misc{sohl-dicksteinDeepUnsupervisedLearning2015, - title = {Deep {{Unsupervised Learning}} Using {{Nonequilibrium Thermodynamics}}}, - author = {{Sohl-Dickstein}, Jascha and Weiss, Eric A. and Maheswaranathan, Niru and Ganguli, Surya}, - year = {2015}, - month = nov, - number = {arXiv:1503.03585}, - eprint = {1503.03585}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1503.03585}, - urldate = {2025-09-04}, - abstract = {A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Condensed Matter - Disordered Systems and Neural Networks,Quantitative Biology - Neurons and Cognition,Statistics - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/YZ5GBG5Z/Sohl-Dickstein et al. - 2015 - Deep Unsupervised Learning using Nonequilibrium Thermodynamics.pdf;/Users/fracapuano/Zotero/storage/97PKSBVT/1503.html} -} - -@inproceedings{sohnLearningStructuredOutput2015, - title = {Learning {{Structured Output Representation}} Using {{Deep Conditional Generative Models}}}, - booktitle = {Advances in {{Neural Information Processing Systems}}}, - author = {Sohn, Kihyuk and Lee, Honglak and Yan, Xinchen}, - year = {2015}, - volume = {28}, - publisher = {Curran Associates, Inc.}, - urldate = {2025-09-02}, - abstract = {Supervised deep learning has been successfully applied for many recognition problems in machine learning and computer vision. Although it can approximate a complex many-to-one function very well when large number of training data is provided, the lack of probabilistic inference of the current supervised deep learning methods makes it difficult to model a complex structured output representations. In this work, we develop a scalable deep conditional generative model for structured output variables using Gaussian latent variables. The model is trained efficiently in the framework of stochastic gradient variational Bayes, and allows a fast prediction using stochastic feed-forward inference. In addition, we provide novel strategies to build a robust structured prediction algorithms, such as recurrent prediction network architecture, input noise-injection and multi-scale prediction training methods. In experiments, we demonstrate the effectiveness of our proposed algorithm in comparison to the deterministic deep neural network counterparts in generating diverse but realistic output representations using stochastic inference. Furthermore, the proposed schemes in training methods and architecture design were complimentary, which leads to achieve strong pixel-level object segmentation and semantic labeling performance on Caltech-UCSD Birds 200 and the subset of Labeled Faces in the Wild dataset.}, - file = {/Users/fracapuano/Zotero/storage/T6QP2WB3/Sohn et al. - 2015 - Learning Structured Output Representation using Deep Conditional Generative Models.pdf} -} - -@misc{songDenoisingDiffusionImplicit2022, - title = {Denoising {{Diffusion Implicit Models}}}, - author = {Song, Jiaming and Meng, Chenlin and Ermon, Stefano}, - year = {2022}, - month = oct, - number = {arXiv:2010.02502}, - eprint = {2010.02502}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2010.02502}, - urldate = {2025-09-06}, - abstract = {Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples \$10 {\textbackslash}times\$ to \$50 {\textbackslash}times\$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/INI2LBQV/Song et al. - 2022 - Denoising Diffusion Implicit Models.pdf;/Users/fracapuano/Zotero/storage/GE2U4XU7/2010.html} -} - -@article{SpinningUp2018, - title = {Spinning up in Deep Reinforcement Learning}, - author = {Achiam, Joshua}, - year = {2018} -} - -@misc{SuttonBartoBook, - title = {Sutton \& {{Barto Book}}: {{Reinforcement Learning}}: {{An Introduction}}}, - urldate = {2025-08-28}, - howpublished = {http://incompleteideas.net/book/the-book-2nd.html}, - file = {/Users/fracapuano/Zotero/storage/A3QZFGPB/the-book-2nd.html} -} - -@inproceedings{suttonPolicyGradientMethods1999, - title = {Policy {{Gradient Methods}} for {{Reinforcement Learning}} with {{Function Approximation}}}, - booktitle = {Advances in {{Neural Information Processing Systems}}}, - author = {Sutton, Richard S and McAllester, David and Singh, Satinder and Mansour, Yishay}, - year = {1999}, - volume = {12}, - publisher = {MIT Press}, - urldate = {2025-08-31}, - abstract = {Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and deter(cid:173) mining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, indepen(cid:173) dent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.}, - file = {/Users/fracapuano/Zotero/storage/4EKJMS5H/Sutton et al. - 1999 - Policy Gradient Methods for Reinforcement Learning with Function Approximation.pdf} -} - -@inproceedings{suttonPolicyGradientMethods1999a, - title = {Policy {{Gradient Methods}} for {{Reinforcement Learning}} with {{Function Approximation}}}, - booktitle = {Advances in {{Neural Information Processing Systems}}}, - author = {Sutton, Richard S and McAllester, David and Singh, Satinder and Mansour, Yishay}, - year = {1999}, - volume = {12}, - publisher = {MIT Press}, - urldate = {2025-08-31}, - abstract = {Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and deter(cid:173) mining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, indepen(cid:173) dent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.}, - file = {/Users/fracapuano/Zotero/storage/JNPS7AMN/Sutton et al. - 1999 - Policy Gradient Methods for Reinforcement Learning with Function Approximation.pdf} -} - -@book{suttonReinforcementLearningIntroduction2018, - title = {Reinforcement Learning: An Introduction}, - shorttitle = {Reinforcement Learning}, - author = {Sutton, Richard S. and Barto, Andrew G.}, - year = {2018}, - series = {Adaptive Computation and Machine Learning Series}, - edition = {Second edition}, - publisher = {The MIT Press}, - address = {Cambridge, Massachusetts}, - abstract = {"Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms."--}, - isbn = {978-0-262-03924-6}, - langid = {english}, - lccn = {Q325.6 .R45 2018}, - keywords = {Reinforcement learning}, - file = {/Users/fracapuano/Zotero/storage/CJB8FNNL/Sutton and Barto - 2018 - Reinforcement learning an introduction.pdf} -} - -@misc{tancikFourierFeaturesLet2020, - title = {Fourier {{Features Let Networks Learn High Frequency Functions}} in {{Low Dimensional Domains}}}, - author = {Tancik, Matthew and Srinivasan, Pratul P. and Mildenhall, Ben and {Fridovich-Keil}, Sara and Raghavan, Nithin and Singhal, Utkarsh and Ramamoorthi, Ravi and Barron, Jonathan T. and Ng, Ren}, - year = {2020}, - month = jun, - number = {arXiv:2006.10739}, - eprint = {2006.10739}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2006.10739}, - urldate = {2025-09-06}, - abstract = {We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/AYWWN7ME/Tancik et al. - 2020 - Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains.pdf;/Users/fracapuano/Zotero/storage/68Q4Y4LM/2006.html} -} - -@misc{tangDeepReinforcementLearning2024, - title = {Deep {{Reinforcement Learning}} for {{Robotics}}: {{A Survey}} of {{Real-World Successes}}}, - shorttitle = {Deep {{Reinforcement Learning}} for {{Robotics}}}, - author = {Tang, Chen and Abbatematteo, Ben and Hu, Jiaheng and Chandra, Rohan and {Mart{\'i}n-Mart{\'i}n}, Roberto and Stone, Peter}, - year = {2024}, - month = sep, - number = {arXiv:2408.03539}, - eprint = {2408.03539}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2408.03539}, - urldate = {2025-08-29}, - abstract = {Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. This article provides a modern survey of DRL for robotics, with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies. Our analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. We highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms, holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks, and principled development and evaluation procedures. This survey is designed to offer insights for both RL practitioners and roboticists toward harnessing RL's power to create generally capable real-world robotic systems.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/ZTX4VSMA/Tang et al. - 2024 - Deep Reinforcement Learning for Robotics A Survey of Real-World Successes.pdf;/Users/fracapuano/Zotero/storage/WDVGKFL3/2408.html} -} - -@article{tangDeepReinforcementLearning2025, - title = {Deep {{Reinforcement Learning}} for {{Robotics}}: {{A Survey}} of {{Real-World Successes}}}, - shorttitle = {Deep {{Reinforcement Learning}} for {{Robotics}}}, - author = {Tang, Chen and Abbatematteo, Ben and Hu, Jiaheng and Chandra, Rohan and {Mart{\'i}n-Mart{\'i}n}, Roberto and Stone, Peter}, - year = {2025}, - month = may, - journal = {Annual Review of Control, Robotics, and Autonomous Systems}, - volume = {8}, - number = {Volume 8, 2025}, - pages = {153--188}, - publisher = {Annual Reviews}, - issn = {2573-5144}, - doi = {10.1146/annurev-control-030323-022510}, - urldate = {2025-08-29}, - abstract = {Reinforcement learning (RL), particularly its combination with deep neural networks, referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. This article provides a modern survey of DRL for robotics, with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies. Our analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. We highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms; holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks; and principled development and evaluation procedures. This survey is designed to offer insights for both RL practitioners and roboticists toward harnessing RL\'s power to create generally capable real-world robotic systems.}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/CCNUWJ73/Tang et al. - 2025 - Deep Reinforcement Learning for Robotics A Survey of Real-World Successes.pdf;/Users/fracapuano/Zotero/storage/UVIIIEXP/Tang et al. - 2025 - Deep Reinforcement Learning for Robotics A Survey of Real-World Successes.pdf;/Users/fracapuano/Zotero/storage/EUKPASJ2/annurev-control-030323-022510.html} -} - -@article{tangPerceptionNavigationAutonomous2023, - title = {Perception and {{Navigation}} in {{Autonomous Systems}} in the {{Era}} of {{Learning}}: {{A Survey}}}, - shorttitle = {Perception and {{Navigation}} in {{Autonomous Systems}} in the {{Era}} of {{Learning}}}, - author = {Tang, Yang and Zhao, Chaoqiang and Wang, Jianrui and Zhang, Chongzhen and Sun, Qiyu and Zheng, Weixing and Du, Wenli and Qian, Feng and Kurths, Juergen}, - year = {2023}, - month = dec, - journal = {IEEE Transactions on Neural Networks and Learning Systems}, - volume = {34}, - number = {12}, - eprint = {2001.02319}, - primaryclass = {cs}, - pages = {9604--9624}, - issn = {2162-237X, 2162-2388}, - doi = {10.1109/TNNLS.2022.3167688}, - urldate = {2025-08-27}, - abstract = {Autonomous systems possess the features of inferring their own state, understanding their surroundings, and performing autonomous navigation. With the applications of learning systems, like deep learning and reinforcement learning, the visual-based self-state estimation, environment perception and navigation capabilities of autonomous systems have been efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous visual perception and navigation. In this review, we focus on the applications of learning-based monocular approaches in ego-motion perception, environment perception and navigation in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions, which demonstrate the necessity to integrate deep learning techniques. Second, we review the visual-based environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, monocular ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional vSLAM frameworks. Then, we focus on the visual navigation based on learning systems, mainly including reinforcement learning and deep reinforcement learning. Finally, we examine several challenges and promising directions discussed and concluded in related research of learning systems in the era of computer science and robotics.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/D3YRY6XE/Tang et al. - 2023 - Perception and Navigation in Autonomous Systems in the Era of Learning A Survey.pdf;/Users/fracapuano/Zotero/storage/SAYN9GG9/2001.html} -} - -@misc{teamGemma2Improving2024, - title = {Gemma 2: {{Improving Open Language Models}} at a {{Practical Size}}}, - shorttitle = {Gemma 2}, - author = {Team, Gemma and Riviere, Morgane and Pathak, Shreya and Sessa, Pier Giuseppe and Hardin, Cassidy and Bhupatiraju, Surya and Hussenot, L{\'e}onard and Mesnard, Thomas and Shahriari, Bobak and Ram{\'e}, Alexandre and Ferret, Johan and Liu, Peter and Tafti, Pouya and Friesen, Abe and Casbon, Michelle and Ramos, Sabela and Kumar, Ravin and Lan, Charline Le and Jerome, Sammy and Tsitsulin, Anton and Vieillard, Nino and Stanczyk, Piotr and Girgin, Sertan and Momchev, Nikola and Hoffman, Matt and Thakoor, Shantanu and Grill, Jean-Bastien and Neyshabur, Behnam and Bachem, Olivier and Walton, Alanna and Severyn, Aliaksei and Parrish, Alicia and Ahmad, Aliya and Hutchison, Allen and Abdagic, Alvin and Carl, Amanda and Shen, Amy and Brock, Andy and Coenen, Andy and Laforge, Anthony and Paterson, Antonia and Bastian, Ben and Piot, Bilal and Wu, Bo and Royal, Brandon and Chen, Charlie and Kumar, Chintu and Perry, Chris and Welty, Chris and {Choquette-Choo}, Christopher A. and Sinopalnikov, Danila and Weinberger, David and Vijaykumar, Dimple and Rogozi{\'n}ska, Dominika and Herbison, Dustin and Bandy, Elisa and Wang, Emma and Noland, Eric and Moreira, Erica and Senter, Evan and Eltyshev, Evgenii and Visin, Francesco and Rasskin, Gabriel and Wei, Gary and Cameron, Glenn and Martins, Gus and Hashemi, Hadi and {Klimczak-Pluci{\'n}ska}, Hanna and Batra, Harleen and Dhand, Harsh and Nardini, Ivan and Mein, Jacinda and Zhou, Jack and Svensson, James and Stanway, Jeff and Chan, Jetha and Zhou, Jin Peng and Carrasqueira, Joana and Iljazi, Joana and Becker, Jocelyn and Fernandez, Joe and van Amersfoort, Joost and Gordon, Josh and Lipschultz, Josh and Newlan, Josh and Ji, Ju-yeong and Mohamed, Kareem and Badola, Kartikeya and Black, Kat and Millican, Katie and McDonell, Keelin and Nguyen, Kelvin and Sodhia, Kiranbir and Greene, Kish and Sjoesund, Lars Lowe and Usui, Lauren and Sifre, Laurent and Heuermann, Lena and Lago, Leticia and McNealus, Lilly and Soares, Livio Baldini and Kilpatrick, Logan and Dixon, Lucas and Martins, Luciano and Reid, Machel and Singh, Manvinder and Iverson, Mark and G{\"o}rner, Martin and Velloso, Mat and Wirth, Mateo and Davidow, Matt and Miller, Matt and Rahtz, Matthew and Watson, Matthew and Risdal, Meg and Kazemi, Mehran and Moynihan, Michael and Zhang, Ming and Kahng, Minsuk and Park, Minwoo and Rahman, Mofi and Khatwani, Mohit and Dao, Natalie and Bardoliwalla, Nenshad and Devanathan, Nesh and Dumai, Neta and Chauhan, Nilay and Wahltinez, Oscar and Botarda, Pankil and Barnes, Parker and Barham, Paul and Michel, Paul and Jin, Pengchong and Georgiev, Petko and Culliton, Phil and Kuppala, Pradeep and Comanescu, Ramona and Merhej, Ramona and Jana, Reena and Rokni, Reza Ardeshir and Agarwal, Rishabh and Mullins, Ryan and Saadat, Samaneh and Carthy, Sara Mc and Perrin, Sarah and Arnold, S{\'e}bastien M. R. and Krause, Sebastian and Dai, Shengyang and Garg, Shruti and Sheth, Shruti and Ronstrom, Sue and Chan, Susan and Jordan, Timothy and Yu, Ting and Eccles, Tom and Hennigan, Tom and Kocisky, Tomas and Doshi, Tulsee and Jain, Vihan and Yadav, Vikas and Meshram, Vilobh and Dharmadhikari, Vishal and Barkley, Warren and Wei, Wei and Ye, Wenming and Han, Woohyun and Kwon, Woosuk and Xu, Xiang and Shen, Zhe and Gong, Zhitao and Wei, Zichuan and Cotruta, Victor and Kirk, Phoebe and Rao, Anand and Giang, Minh and Peran, Ludovic and Warkentin, Tris and Collins, Eli and Barral, Joelle and Ghahramani, Zoubin and Hadsell, Raia and Sculley, D. and Banks, Jeanine and Dragan, Anca and Petrov, Slav and Vinyals, Oriol and Dean, Jeff and Hassabis, Demis and Kavukcuoglu, Koray and Farabet, Clement and Buchatskaya, Elena and Borgeaud, Sebastian and Fiedel, Noah and Joulin, Armand and Kenealy, Kathleen and Dadashi, Robert and Andreev, Alek}, - year = {2024}, - month = aug, - number = {arXiv:2408.00118}, - eprint = {2408.00118}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2408.00118}, - urldate = {2025-09-08}, - abstract = {In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language}, - file = {/Users/fracapuano/Zotero/storage/NTLZNFPL/Team et al. - 2024 - Gemma 2 Improving Open Language Models at a Practical Size.pdf;/Users/fracapuano/Zotero/storage/GKX7JFK3/2408.html} -} - -@misc{tedrakeRoboticManipulationPerception, - title = {Robotic {{Manipulation}}. {{Perception}}, {{Planning}} and {{Control}}.}, - author = {Tedrake, Russ} -} - -@misc{tedrakeUnderactuatedRoboticsAlgorithms, - title = {Underactuated {{Robotics}}. {{Algorithms}} for {{Walking}}, {{Running}}, {{Swimming}}, {{Flying}}, and {{Manipulation}}}, - author = {Tedrake, Russ} -} - -@article{thrunPROBABILISTICROBOTICS, - title = {{{PROBABILISTIC ROBOTICS}}}, - author = {Thrun, Sebastian and Burgard, Wolfram and Fox, Dieter}, - langid = {english}, - file = {/Users/fracapuano/Zotero/storage/UKNC34V7/Thrun et al. - PROBABILISTIC ROBOTICS.pdf} -} - -@misc{tiboniDomainRandomizationEntropy2024, - title = {Domain {{Randomization}} via {{Entropy Maximization}}}, - author = {Tiboni, Gabriele and Klink, Pascal and Peters, Jan and Tommasi, Tatiana and D'Eramo, Carlo and Chalvatzaki, Georgia}, - year = {2024}, - month = mar, - number = {arXiv:2311.01885}, - eprint = {2311.01885}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2311.01885}, - urldate = {2025-08-30}, - abstract = {Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/T5KH6GM9/Tiboni et al. - 2024 - Domain Randomization via Entropy Maximization.pdf;/Users/fracapuano/Zotero/storage/KRE436NC/2311.html} -} - -@misc{tiboniDROPOSimtoRealTransfer2023, - title = {{{DROPO}}: {{Sim-to-Real Transfer}} with {{Offline Domain Randomization}}}, - shorttitle = {{{DROPO}}}, - author = {Tiboni, Gabriele and Arndt, Karol and Kyrki, Ville}, - year = {2023}, - month = jan, - number = {arXiv:2201.08434}, - eprint = {2201.08434}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2201.08434}, - urldate = {2025-08-31}, - abstract = {In recent years, domain randomization over dynamics parameters has gained a lot of traction as a method for sim-to-real transfer of reinforcement learning policies in robotic manipulation; however, finding optimal randomization distributions can be difficult. In this paper, we introduce DROPO, a novel method for estimating domain randomization distributions for safe sim-to-real transfer. Unlike prior work, DROPO only requires a limited, precollected offline dataset of trajectories, and explicitly models parameter uncertainty to match real data using a likelihood-based approach. We demonstrate that DROPO is capable of recovering dynamic parameter distributions in simulation and finding a distribution capable of compensating for an unmodeled phenomenon. We also evaluate the method in two zero-shot sim-to-real transfer scenarios, showing successful domain transfer and improved performance over prior methods.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/Q875LPZF/Tiboni et al. - 2023 - DROPO Sim-to-Real Transfer with Offline Domain Randomization.pdf;/Users/fracapuano/Zotero/storage/2NQ4L37P/2201.html} -} - -@misc{tobinDomainRandomizationTransferring2017, - title = {Domain {{Randomization}} for {{Transferring Deep Neural Networks}} from {{Simulation}} to the {{Real World}}}, - author = {Tobin, Josh and Fong, Rachel and Ray, Alex and Schneider, Jonas and Zaremba, Wojciech and Abbeel, Pieter}, - year = {2017}, - month = mar, - number = {arXiv:1703.06907}, - eprint = {1703.06907}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.1703.06907}, - urldate = {2025-08-30}, - abstract = {Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to \$1.5\$cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/TYJZAD9R/Tobin et al. - 2017 - Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.pdf;/Users/fracapuano/Zotero/storage/C9QS7DES/1703.html} -} - -@article{tong2024cambrian, - title = {Cambrian-1: {{A}} Fully Open, Vision-Centric Exploration of Multimodal Llms}, - author = {Tong, Peter and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and IYER, Adithya Jairam Vedagiri and Akula, Sai Charitha and Yang, Shusheng and Yang, Jihan and Middepogu, Manoj and Wang, Ziteng and others}, - year = {2024}, - journal = {Advances in Neural Information Processing Systems}, - volume = {37}, - pages = {87310--87356} -} - -@misc{touvronLlama2Open2023, - title = {Llama 2: {{Open Foundation}} and {{Fine-Tuned Chat Models}}}, - shorttitle = {Llama 2}, - author = {Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and Bikel, Dan and Blecher, Lukas and Ferrer, Cristian Canton and Chen, Moya and Cucurull, Guillem and Esiobu, David and Fernandes, Jude and Fu, Jeremy and Fu, Wenyin and Fuller, Brian and Gao, Cynthia and Goswami, Vedanuj and Goyal, Naman and Hartshorn, Anthony and Hosseini, Saghar and Hou, Rui and Inan, Hakan and Kardas, Marcin and Kerkez, Viktor and Khabsa, Madian and Kloumann, Isabel and Korenev, Artem and Koura, Punit Singh and Lachaux, Marie-Anne and Lavril, Thibaut and Lee, Jenya and Liskovich, Diana and Lu, Yinghai and Mao, Yuning and Martinet, Xavier and Mihaylov, Todor and Mishra, Pushkar and Molybog, Igor and Nie, Yixin and Poulton, Andrew and Reizenstein, Jeremy and Rungta, Rashi and Saladi, Kalyan and Schelten, Alan and Silva, Ruan and Smith, Eric Michael and Subramanian, Ranjan and Tan, Xiaoqing Ellen and Tang, Binh and Taylor, Ross and Williams, Adina and Kuan, Jian Xiang and Xu, Puxin and Yan, Zheng and Zarov, Iliyan and Zhang, Yuchen and Fan, Angela and Kambadur, Melanie and Narang, Sharan and Rodriguez, Aurelien and Stojnic, Robert and Edunov, Sergey and Scialom, Thomas}, - year = {2023}, - month = jul, - number = {arXiv:2307.09288}, - eprint = {2307.09288}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2307.09288}, - urldate = {2025-09-08}, - abstract = {In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language}, - file = {/Users/fracapuano/Zotero/storage/VKQFSEUF/Touvron et al. - 2023 - Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf;/Users/fracapuano/Zotero/storage/N6MFUQCF/2307.html} -} - -@article{tsimpoukelli2021multimodalfrozen, - title = {Multimodal Few-Shot Learning with Frozen Language Models}, - author = {Tsimpoukelli, Maria and Menick, Jacob L and Cabi, Serkan and Eslami, {\relax SM} and Vinyals, Oriol and Hill, Felix}, - year = {2021}, - journal = {Advances in Neural Information Processing Systems}, - volume = {34}, - pages = {200--212} -} - -@article{vallaeys2024improveddepalm, - title = {Improved Baselines for Data-Efficient Perceptual Augmentation of Llms}, - author = {Vallaeys, Th{\'e}ophane and Shukor, Mustafa and Cord, Matthieu and Verbeek, Jakob}, - year = {2024}, - journal = {arXiv preprint arXiv:2403.13499}, - eprint = {2403.13499}, - archiveprefix = {arXiv} -} - -@article{wang2025internvideo2, - title = {{{InternVideo2}}. 5: {{Empowering}} Video Mllms with Long and Rich Context Modeling}, - author = {Wang, Yi and Li, Xinhao and Yan, Ziang and He, Yinan and Yu, Jiashuo and Zeng, Xiangyu and Wang, Chenting and Ma, Changlian and Huang, Haian and Gao, Jianfei and others}, - year = {2025}, - journal = {arXiv preprint arXiv:2501.12386}, - eprint = {2501.12386}, - archiveprefix = {arXiv} -} - -@misc{zhaiSigmoidLossLanguage2023, - title = {Sigmoid {{Loss}} for {{Language Image Pre-Training}}}, - author = {Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas}, - year = {2023}, - month = sep, - number = {arXiv:2303.15343}, - eprint = {2303.15343}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2303.15343}, - urldate = {2025-09-09}, - abstract = {We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. The sigmoid loss simultaneously allows further scaling up the batch size, while also performing better at smaller batch sizes. Combined with Locked-image Tuning, with only four TPUv4 chips, we train a SigLiT model that achieves 84.5\% ImageNet zero-shot accuracy in two days. The disentanglement of the batch size from the loss further allows us to study the impact of examples vs pairs and negative to positive ratio. Finally, we push the batch size to the extreme, up to one million, and find that the benefits of growing batch size quickly diminish, with a more reasonable batch size of 32k being sufficient. We release our models at https://github.com/google-research/big\_vision and hope our research motivates further explorations in improving the quality and efficiency of language-image pre-training.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition}, - file = {/Users/fracapuano/Zotero/storage/Z39H5W8R/Zhai et al. - 2023 - Sigmoid Loss for Language Image Pre-Training.pdf;/Users/fracapuano/Zotero/storage/IYX9QALK/2303.html} -} - -@article{zhang2025videollama, - title = {{{VideoLLaMA}} 3: {{Frontier}} Multimodal Foundation Models for Image and Video Understanding}, - author = {Zhang, Boqiang and Li, Kehan and Cheng, Zesen and Hu, Zhiqiang and Yuan, Yuqian and Chen, Guanzheng and Leng, Sicong and Jiang, Yuming and Zhang, Hang and Li, Xin and others}, - year = {2025}, - journal = {arXiv preprint arXiv:2501.13106}, - eprint = {2501.13106}, - archiveprefix = {arXiv} -} - -@misc{zhangWoCoCoLearningWholeBody2024, - title = {{{WoCoCo}}: {{Learning Whole-Body Humanoid Control}} with {{Sequential Contacts}}}, - shorttitle = {{{WoCoCo}}}, - author = {Zhang, Chong and Xiao, Wenli and He, Tairan and Shi, Guanya}, - year = {2024}, - month = nov, - number = {arXiv:2406.06005}, - eprint = {2406.06005}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2406.06005}, - urldate = {2025-08-26}, - abstract = {Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Graphics,Computer Science - Robotics,Computer Science - Systems and Control,Electrical Engineering and Systems Science - Systems and Control}, - file = {/Users/fracapuano/Zotero/storage/2SYII7A2/Zhang et al. - 2024 - WoCoCo Learning Whole-Body Humanoid Control with Sequential Contacts.pdf;/Users/fracapuano/Zotero/storage/C6ZJPZEV/2406.html} -} - -@misc{zhaoLearningFineGrainedBimanual2023, - title = {Learning {{Fine-Grained Bimanual Manipulation}} with {{Low-Cost Hardware}}}, - author = {Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea}, - year = {2023}, - month = apr, - number = {arXiv:2304.13705}, - eprint = {2304.13705}, - primaryclass = {cs}, - publisher = {arXiv}, - doi = {10.48550/arXiv.2304.13705}, - urldate = {2025-08-26}, - abstract = {Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90\% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Machine Learning,Computer Science - Robotics}, - file = {/Users/fracapuano/Zotero/storage/4P7GCF3I/Zhao et al. - 2023 - Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.pdf;/Users/fracapuano/Zotero/storage/3BC9S3Z2/2304.html} -} - -@misc{zhongPracticalBlockwiseNeural2018, - title = {Practical {{Block-wise Neural Network Architecture Generation}}}, - author = {Zhong, Zhao and Yan, Junjie and Wu, Wei and Shao, Jing and Liu, Cheng-Lin}, - year = {2018}, - month = may, - number = {arXiv:1708.05552}, - eprint = {1708.05552}, - primaryclass = {cs}, - publisher = {arXiv}, - urldate = {2023-05-05}, - abstract = {Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained sequentially to choose component layers. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it performs competitive results in comparison to the hand-crafted state-of-the-art networks on image classification, additionally, the best network generated by BlockQNN achieves 3.54\% top-1 error rate on CIFAR-10 which beats all existing auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of the search space in designing networks which only spends 3 days with 32 GPUs, and (3) moreover, it has strong generalizability that the network built on CIFAR also performs well on a larger-scale ImageNet dataset.}, - archiveprefix = {arXiv}, - keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning}, - file = {/Users/fracapuano/Zotero/storage/7ZJWPCRW/Zhong et al. - 2018 - Practical Block-wise Neural Network Architecture G.pdf;/Users/fracapuano/Zotero/storage/ZI2R395F/Zhong et al. - 2018 - Practical Block-wise Neural Network Architecture G.html} -} - -@inproceedings{zhu2024minigpt, - title = {{{MiniGPT-4}}: {{Enhancing}} Vision-Language Understanding with Advanced Large Language Models}, - booktitle = {The Twelfth International Conference on Learning Representations}, - author = {Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed}, - year = {2024} -} - -@misc{zotero-item-169, - type = {Misc} -} diff --git a/app/scripts/latex-to-mdx/input/main.dvi b/app/scripts/latex-to-mdx/input/main.dvi deleted file mode 100644 index b3715804068d8cde9ebbc5da4fd8d28de3851a50..0000000000000000000000000000000000000000 Binary files a/app/scripts/latex-to-mdx/input/main.dvi and /dev/null differ diff --git a/app/scripts/latex-to-mdx/input/main.tex b/app/scripts/latex-to-mdx/input/main.tex deleted file mode 100644 index e5bf82a8a6ef91117d93e8a1ccb06d4ab1e8e40a..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/main.tex +++ /dev/null @@ -1,247 +0,0 @@ -\documentclass[table]{hfstyle/hf} - -% Basic packages -\usepackage[utf8]{inputenc} -\usepackage[T1]{fontenc} -\usepackage{graphicx} -\usepackage{booktabs} -\usepackage{url} -\usepackage{lineno} -\usepackage{enumitem} -\usepackage{listings} - -% Math and symbols -\usepackage{amsmath} -\usepackage{amsfonts} -\usepackage{amssymb} -\usepackage{nicefrac} -\usepackage{siunitx} - -% Tables and figures -\usepackage{multirow} -\usepackage{bigdelim} -\usepackage{longtable} -\usepackage{tabularray} -\usepackage{wrapfig} -\usepackage{caption} -\usepackage{subcaption} -\usepackage{makecell} -\usepackage{adjustbox} - -% Color and boxes -\usepackage[most]{tcolorbox} -\usepackage{xcolor} - -% Text and formatting -\usepackage{xspace} -\usepackage{soul} -\usepackage{csquotes} -\usepackage{arydshln} - -% Bibliography and references -\usepackage{natbib} - -% Special packages -\usepackage{todonotes} -\usepackage[absolute]{textpos} -\usepackage{pifont} -\usepackage{bold-extra} -\usepackage{pgf-pie} -\usepackage{epigraph} - -% Algorithms -\usepackage{algorithm} -\usepackage{algpseudocode} - -% Hyperref (load last) -\usepackage{hyperref} -\definecolor{linkcolor}{RGB}{0, 0, 128} -\hypersetup{ - colorlinks = true, - citecolor = linkcolor, - linkcolor = linkcolor, - urlcolor = linkcolor, -} - -% Custom commands -\newcommand{\cmark}{\ding{51}}% -\newcommand{\xmark}{\ding{55}}% - -\setlist[itemize]{leftmargin=*,itemsep=0em,parsep=0.3em,topsep=0.3em} - -\DeclareUnicodeCharacter{2212}{\ensuremath{-}} - -\addtolength{\extrarowheight}{\belowrulesep} -\aboverulesep=0pt -\belowrulesep=0pt - -\definecolor{maroon}{HTML}{F26035} -\definecolor{yellow}{HTML}{FDBC42} -\definecolor{lavender}{HTML}{734f96} -\definecolor{darkergrey}{HTML}{444444} -\definecolor{midgrey}{HTML}{e6eded} - -\definecolor{neutralEight}{HTML}{343434} -\definecolor{neutralFive}{HTML}{838383} -\definecolor{neutralThree}{HTML}{bebebe} -\definecolor{neutralOne}{HTML}{dedede} -\definecolor{lightgrey}{HTML}{fafcfc} - -\usepackage{tikz} -\newcommand{\cblock}[3]{ - \hspace{-1.5mm} - \begin{tikzpicture} - [ - node/.style={square, minimum size=10mm, thick, line width=0pt}, - ] - \node[fill={rgb,255:red,#1;green,#2;blue,#3}] () [] {}; - \end{tikzpicture}% -} - -\newcommand{\norm}[1]{\left\lVert#1\right\rVert} - -\definecolor{maroon}{HTML}{F26035} -\definecolor{yellow}{HTML}{FDBC42} -\definecolor{darkred}{RGB}{156, 39, 33} -\definecolor{darkblue}{RGB}{31, 90, 153} -\definecolor{forestgreen}{rgb}{0.13, 0.55, 0.13} -\definecolor{olmoDarkBlue}{HTML}{012e59} -\definecolor{olmoBlue}{HTML}{265ed4} -\definecolor{olmoLightBlue}{HTML}{012e59} -\definecolor{olmoTeal}{HTML}{00d5ff} -\definecolor{olmoYellow}{HTML}{ffbb00} -\definecolor{olmoOrange}{HTML}{ff9100} - -\newcommand{\nol}[1]{{\color{purple} [nol]: #1}} - -% Code snippets definitions -\definecolor{codegreen}{rgb}{0,0.6,0} -\definecolor{codegray}{rgb}{0.5,0.5,0.5} -\definecolor{codepurple}{rgb}{0.58,0,0.82} -\definecolor{backcolour}{rgb}{0.95,0.95,0.92} - -\lstdefinestyle{mycodestyle}{ - backgroundcolor=\color{backcolour}, - commentstyle=\color{codegreen}, - keywordstyle=\color{magenta}, - numberstyle=\tiny\color{codegray}, - stringstyle=\color{codepurple}, - basicstyle=\ttfamily\footnotesize, - breakatwhitespace=false, - breaklines=true, - captionpos=b, - keepspaces=true, - numbers=left, - numbersep=5pt, - showspaces=false, - showstringspaces=false, - showtabs=false, - tabsize=2 -} - -\lstset{style=mycodestyle} - - -\usepackage{setspace} - -\usepackage{nicematrix} -\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}} -\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}} -\newcolumntype{R}[1]{>{\raggedleft\let\newline\\\arraybackslash\hspace{0pt}}m{#1}} -\newcolumntype{P}[1]{>{\centering\let\newline\\\arraybackslash\columncolor{ai2lightpink}}m{#1}} -\addtolength{\extrarowheight}{\belowrulesep} -\aboverulesep=0pt -\belowrulesep=0pt - -\newcommand{\orr}[1]{\textcolor{red}{[OZ:#1]}} - -\tcbuselibrary{minted} -\usemintedstyle{colorful} - -\renewcommand{\theFancyVerbLine}{\color{olmoBlue}\footnotesize\arabic{FancyVerbLine}} - -\setminted[python]{ - linenos, - breaklines, - fontsize=\footnotesize, - xleftmargin=2em -} -\crefname{tcb@cnt@pbox}{code}{code} -\Crefname{tcb@cnt@pbox}{Code}{Code} -\crefname{assumption}{assumption}{assumption} -\Crefname{assumption}{Assumption}{Assumptions} - - - -\newtcolorbox[auto counter]{pbox}[2][]{ - colback=white, - title=\textbf{Code~\thetcbcounter: #2}, - #1,fonttitle=\sffamily, - fontupper=\sffamily, - arc=10pt, - colframe=hf4, - coltitle=hf3, - colbacktitle=hf4, - toptitle=0.25cm, - bottomtitle=0.125cm -} - -\input{preamble} -\input{math_commands} -\input{handles} - -\title{ -Robot Learning: A Tutorial -} - -\newcommand{\huggingface}{\raisebox{-1.5pt}{\includegraphics[height=1.05em]{logos/hf.pdf}}\xspace} -\newcommand{\coreContrib}{\raisebox{.33em}{\hspace{.05em}\includegraphics[height=.5em]{logos/core.png}}\xspace} - -\newcommand{\hf}{\raisebox{.28em}{\hspace{.05em}\includegraphics[height=.65em]{logos/hf.pdf}}\xspace} -\newcommand{\ensps}{\raisebox{.3em}{\hspace{.05em}\includegraphics[height=.65em]{logos/ensps_logo.pdf}}\xspace} - -\authorOne[]{Francesco Capuano \ensps \hf} -\authorOne[]{...} -\authorOne[]{Adil Zouitine\hf} -\authorOne[]{Pepijn Kooijmans\hf} -\authorOne[]{Thomas Wolf\hf} -\authorOne[]{Michel Aractingi\hf} - -\contribution[]{\ensps ร‰cole Normale Supรฉrieure Paris-Saclay, \hf Hugging Face} - -\newcommand{\fix}{\marginpar{FIX}} -\newcommand{\new}{\marginpar{NEW}} - -\abstract{ -\input{sections/00_abstract} -} - -\begin{document} - - -\maketitle - -\tableofcontents -\input{sections/A_foreword.tex} - -\newpage -\input{sections/01_introduction} - -\input{sections/02_classic_robotics} - -\newpage -\input{sections/03_reinforcement_learning.tex} - -\newpage -\input{sections/04_imitation_learning.tex} - -\newpage -\input{sections/05_foundation_models.tex} - -\newpage -\input{sections/07_conclusions.tex} - -\bibliographystyle{hfstyle/plainnat} -\bibliography{main} - -\end{document} diff --git a/app/scripts/latex-to-mdx/input/manropebold.tfm b/app/scripts/latex-to-mdx/input/manropebold.tfm deleted file mode 100644 index caed637bbfa723a84c0422d48d73ad97f4a28ee4..0000000000000000000000000000000000000000 Binary files a/app/scripts/latex-to-mdx/input/manropebold.tfm and /dev/null differ diff --git a/app/scripts/latex-to-mdx/input/manroperegular.tfm b/app/scripts/latex-to-mdx/input/manroperegular.tfm deleted file mode 100644 index ca50dabab3acec7c0f1d4d921e3c9734249abfe4..0000000000000000000000000000000000000000 Binary files a/app/scripts/latex-to-mdx/input/manroperegular.tfm and /dev/null differ diff --git a/app/scripts/latex-to-mdx/input/math_commands.tex b/app/scripts/latex-to-mdx/input/math_commands.tex deleted file mode 100644 index 3b1ddb5228ebfe03414d5b9aba8f016d44c8dc05..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/math_commands.tex +++ /dev/null @@ -1,574 +0,0 @@ -\newcommand*\diff{\mathrm{d}} -\newcommand*\Image{\mathrm{Im}} -\newcommand*\NN{\smash{\hat{\mathcal{F}}_{\scriptsize\textrm{NN}}}} - -\newcommand*\X{\mathcal{X}} -\newcommand*\Z{\mathcal{Z}} -\newcommand*\G{\mathcal{G}} -\newcommand*\D{\mathcal{D}} -\newcommand*\F{\mathcal{F}} -\newcommand*\R{\mathcal{R}} -\newcommand*\TR{\hat{R}} -\newcommand*\Deltab{\bar{\Delta}} -\newcommand*\h{h} -\newcommand*\biasb{\mathrm{bias}} -\newcommand*\varb{\mathrm{var}} -\newcommand*\covb{\mathrm{cov}} -\newcommand*\M{\mathcal{M}} -\newcommand*\B{\mathcal{B}} -\newcommand*\W{\mathcal{W}} -\newcommand*\Loss{\mathcal{L}} -\newcommand*{\Ftr}{\smash{\mathcal{F}_{\scriptsize\textrm{tr}}}} -\newcommand*{\Fts}{\smash{\mathcal{F}_{\scriptsize\textrm{ad}}}} -\newcommand*{\Dtr}{\smash{\mathcal{D}_{\scriptsize\textrm{tr}}}} -\newcommand*{\Dts}{\smash{\mathcal{D}_{\scriptsize\textrm{ad}}}} -\newcommand*{\Etr}{\smash{\mathcal{E}_{\scriptsize\textrm{tr}}}} -\newcommand*{\Ead}{\smash{\mathcal{E}_{\scriptsize\textrm{ad}}}} - -\newcommand*{\eg}{e.g.,\@\xspace} -\newcommand*{\versus}{vs.\@\xspace} -\newcommand*{\sut}{s.t.\@\xspace} -\newcommand*{\ie}{i.e.,\@\xspace} -\newcommand*{\iid}{ID\@\xspace} -\newcommand*{\sota}{SoTA\@\xspace} -\newcommand*{\ood}{OOD\@\xspace} -\newcommand*{\metric}{metric} -\newcommand*{\wrt}{w.r.t.\@\xspace} -\newcommand*{\iif}{i.i.f.\@\xspace} -\newcommand*{\aka}{a.k.a.\@\xspace} -\newcommand*{\rhs}{r.h.s.\@\xspace} -\newcommand*{\etc}{etc.\@\xspace} -\newcommand*{\cf}{cf.\@\xspace} -\newcommand*{\resp}{resp.\@\xspace} - -\newcommand*\er{\mathrm{er}} -\newcommand*\ess{\operatorname{ess}} - -\let\originalleft\left -\let\originalright\right -\renewcommand{\left}{\mathopen{}\mathclose\bgroup\originalleft} -\renewcommand{\right}{\aftergroup\egroup\originalright} - -\let\up\textsuperscript -\let\vec\boldsymbol - - -\newcommand{\defeq}{\mathrel{:\mkern-0.25mu=}} -\newcommand{\eqdef}{\mathrel{=\mkern-0.25mu:}} - -\newcommand{\figleft}{{\em (Left)}} -\newcommand{\figcenter}{{\em (Center)}} -\newcommand{\figright}{{\em (Right)}} -\newcommand{\figtop}{{\em (Top)}} -\newcommand{\figbottom}{{\em (Bottom)}} -\newcommand{\captiona}{{\em (a)}} -\newcommand{\captionb}{{\em (b)}} -\newcommand{\captionc}{{\em (c)}} -\newcommand{\captiond}{{\em (d)}} - -\newcommand{\newterm}[1]{{\bf #1}} -\def\figref#1{figure~\ref{#1}} -\def\Figref#1{Figure~\ref{#1}} -\def\twofigref#1#2{figures \ref{#1} and \ref{#2}} -\def\trifigref#1#2#3#4{figures \ref{#1}, \ref{#2}, and \ref{#3}} -\def\quadfigref#1#2#3#4{figures \ref{#1}, \ref{#2}, \ref{#3} and \ref{#4}} -\def\secref#1{section~\ref{#1}} -\def\Secref#1{Section~\ref{#1}} -\def\Termref#1{Term~\ref{#1}} -\def\twosecref#1#2{sections \ref{#1} and \ref{#2}} -\def\trisecref#1#2#3{sections \ref{#1}, \ref{#2} and \ref{#3}} -\def\appref#1{appendix~\ref{#1}} -\def\Appref#1{Appendix~\ref{#1}} -\def\suppref#1{supp.~\ref{#1}} -\def\Suppref#1{Supp.~\ref{#1}} -\def\eqref#1{eq.~\ref{#1}} -\def\Eqref#1{Eq.~\ref{#1}} -\def\plaineqref#1{\ref{#1}} -\def\chapref#1{chapter~\ref{#1}} -\def\Chapref#1{Chapter~\ref{#1}} -\def\rangechapref#1#2{chapters\ref{#1}--\ref{#2}} -\def\algref#1{algorithm~\ref{#1}} -\def\Algref#1{Algorithm~\ref{#1}} -\def\twoalgref#1#2{algorithms \ref{#1} and \ref{#2}} -\def\Twoalgref#1#2{Algorithms \ref{#1} and \ref{#2}} -\def\partref#1{part~\ref{#1}} -\def\Partref#1{Part~\ref{#1}} -\def\twopartref#1#2{parts \ref{#1} and \ref{#2}} - -\def\Tabref#1{Table~\ref{#1}} -\def\tabref#1{table~\ref{#1}} -\def\twotabref#1#2{tables \ref{#1} and \ref{#2}} - -\def\ceil#1{\lceil #1 \rceil} -\def\floor#1{\lfloor #1 \rfloor} - -\newcommand{\Lp}{\mathcal{L}^\text{prior}} -\newcommand{\Ll}{\mathcal{L}^\text{likeli}} -\newcommand{\Lal}{\Ls^{\text{l}\widehat{\text{ikel}}\text{i}}} - -\def\eps{{\varepsilon}} - - -\def\xopt{{x^{*}}} -\def\Gopt{{G^{*}}} - -\def\p{{\textnormal{p}}} -\def\P{{\textnormal{p}}} -\def\Q{{\textnormal{q}}} -\def\q{{\textnormal{q}}} - -\def\gTh{{\hat \gT}} -\def\gDh{{\hat \gD}} -\def\gPh{{\hat \gP}} -\newcommand{\tin}[1]{\mbox{\tiny $#1$}} - - -\def\reta{{\textnormal{$\eta$}}} -\def\ra{{\textnormal{a}}} -\def\rb{{\textnormal{b}}} -\def\rc{{\textnormal{c}}} -\def\rd{{\textnormal{d}}} -\def\re{{\textnormal{e}}} -\def\rf{{\textnormal{f}}} -\def\rg{{\textnormal{g}}} -\def\rh{{\textnormal{h}}} -\def\ri{{\textnormal{i}}} -\def\rj{{\textnormal{j}}} -\def\rk{{\textnormal{k}}} -\def\rl{{\textnormal{l}}} -\def\rn{{\textnormal{n}}} -\def\ro{{\textnormal{o}}} -\def\rp{{\textnormal{p}}} -\def\rq{{\textnormal{q}}} -\def\rr{{\textnormal{r}}} -\def\rs{{\textnormal{s}}} -\def\rt{{\textnormal{t}}} -\def\ru{{\textnormal{u}}} -\def\rv{{\textnormal{v}}} -\def\rw{{\textnormal{w}}} -\def\reps{{\mathcal{E}}} -\def\rtheta{{\Theta}} -\def\rx{{X}} -\def\ry{{Y}} -\def\rz{{Z}} - - -\def\S{\mathcal{S}} -\def\T{\mathcal{T}} -\def\X{\mathcal{X}} -\def\Y{\mathcal{Y}} -\def\U{\mathcal{U}} - -\def\rvepsilon{{\mathbf{\epsilon}}} -\def\rva{{\mathbf{a}}} -\def\rvb{{\mathbf{b}}} -\def\rvc{{\mathbf{c}}} -\def\rvd{{\mathbf{d}}} -\def\rve{{\mathbf{e}}} -\def\rvf{{\mathbf{f}}} -\def\rvg{{\mathbf{g}}} -\def\rvh{{\mathbf{h}}} -\def\rvu{{\mathbf{i}}} -\def\rvj{{\mathbf{j}}} -\def\rvk{{\mathbf{k}}} -\def\rvl{{\mathbf{l}}} -\def\rvm{{\mathbf{m}}} -\def\rvn{{\mathbf{n}}} -\def\rvo{{\mathbf{o}}} -\def\rvp{{\mathbf{p}}} -\def\rvq{{\mathbf{q}}} -\def\rvr{{\mathbf{r}}} -\def\rvs{{\mathbf{s}}} -\def\rvt{{\mathbf{t}}} -\def\rvu{{\mathbf{u}}} -\def\rvv{{\mathbf{v}}} -\def\rvw{{\mathbf{w}}} -\def\rvx{{\mathbf{x}}} -\def\rvy{{\mathbf{y}}} -\def\rvz{{\mathbf{z}}} -\def\rvtheta{{\bm{\theta}}} - -\def\erva{{\textnormal{a}}} -\def\ervb{{\textnormal{b}}} -\def\ervc{{\textnormal{c}}} -\def\ervd{{\textnormal{d}}} -\def\erve{{\textnormal{e}}} -\def\ervf{{\textnormal{f}}} -\def\ervg{{\textnormal{g}}} -\def\ervh{{\textnormal{h}}} -\def\ervi{{\textnormal{i}}} -\def\ervj{{\textnormal{j}}} -\def\ervk{{\textnormal{k}}} -\def\ervl{{\textnormal{l}}} -\def\ervm{{\textnormal{m}}} -\def\ervn{{\textnormal{n}}} -\def\ervo{{\textnormal{o}}} -\def\ervp{{\textnormal{p}}} -\def\ervq{{\textnormal{q}}} -\def\ervr{{\textnormal{r}}} -\def\ervs{{\textnormal{s}}} -\def\ervt{{\textnormal{t}}} -\def\ervu{{\textnormal{u}}} -\def\ervv{{\textnormal{v}}} -\def\ervw{{\textnormal{w}}} -\def\ervx{{\textnormal{x}}} -\def\ervy{{\textnormal{y}}} -\def\ervz{{\textnormal{z}}} - -\def\rmA{{\mathbf{A}}} -\def\rmB{{\mathbf{B}}} -\def\rmC{{\mathbf{C}}} -\def\rmD{{\mathbf{D}}} -\def\rmE{{\mathbf{E}}} -\def\rmF{{\mathbf{F}}} -\def\rmG{{\mathbf{G}}} -\def\rmH{{\mathbf{H}}} -\def\rmI{{\mathbf{I}}} -\def\rmJ{{\mathbf{J}}} -\def\rmK{{\mathbf{K}}} -\def\rmL{{\mathbf{L}}} -\def\rmM{{\mathbf{M}}} -\def\rmN{{\mathbf{N}}} -\def\rmO{{\mathbf{O}}} -\def\rmP{{\mathbf{P}}} -\def\rmQ{{\mathbf{Q}}} -\def\rmR{{\mathbf{R}}} -\def\rmS{{\mathbf{S}}} -\def\rmT{{\mathbf{T}}} -\def\rmU{{\mathbf{U}}} -\def\rmV{{\mathbf{V}}} -\def\rmW{{\mathbf{W}}} -\def\rmx{{\mathbf{x}}} -\def\rmy{{\mathbf{y}}} -\def\rmz{{\mathbf{Z}}} - -\def\ermA{{\textnormal{A}}} -\def\ermB{{\textnormal{B}}} -\def\ermC{{\textnormal{C}}} -\def\ermD{{\textnormal{D}}} -\def\ermE{{\textnormal{E}}} -\def\ermF{{\textnormal{F}}} -\def\ermG{{\textnormal{G}}} -\def\ermH{{\textnormal{H}}} -\def\ermI{{\textnormal{I}}} -\def\ermJ{{\textnormal{J}}} -\def\ermK{{\textnormal{K}}} -\def\ermL{{\textnormal{L}}} -\def\ermM{{\textnormal{M}}} -\def\ermN{{\textnormal{N}}} -\def\ermO{{\textnormal{O}}} -\def\ermP{{\textnormal{P}}} -\def\ermQ{{\textnormal{Q}}} -\def\ermR{{\textnormal{R}}} -\def\ermS{{\textnormal{S}}} -\def\ermT{{\textnormal{T}}} -\def\ermU{{\textnormal{U}}} -\def\ermV{{\textnormal{V}}} -\def\ermW{{\textnormal{W}}} -\def\ermX{{\textnormal{X}}} -\def\ermY{{\textnormal{Y}}} -\def\ermZ{{\textnormal{Z}}} - -\def\vzero{{\bm{0}}} -\def\vone{{\bm{1}}} -\def\va{{\bm{a}}} -\def\vb{{\bm{b}}} -\def\vc{{\bm{c}}} -\def\vd{{\bm{d}}} -\def\ve{{\bm{e}}} -\def\vf{{\bm{f}}} -\def\vg{{\bm{g}}} -\def\vh{{\bm{h}}} -\def\vi{{\bm{i}}} -\def\vj{{\bm{j}}} -\def\vk{{\bm{k}}} -\def\vl{{\bm{l}}} -\def\vm{{\bm{m}}} -\def\vn{{\bm{n}}} -\def\vo{{\bm{o}}} -\def\vp{{\bm{p}}} -\def\vq{{\bm{q}}} -\def\vr{{\bm{r}}} -\def\vs{{\bm{s}}} -\def\vt{{\bm{t}}} -\def\vu{{\bm{u}}} -\def\vv{{\bm{v}}} -\def\vw{{\bm{w}}} -\def\vx{{\bm{x}}} -\def\vy{{\bm{y}}} -\def\vz{{\bm{z}}} -\def\valpha{{\bm{\alpha}}} -\def\vtheta{{\bm{\theta}}} -\def\vdelta{{\bm{\delta}}} -\def\vDelta{{\bm{\Delta}}} -\def\vmu{{\bm{\mu}}} -\def\vphi{{\bm{\phi}}} -\def\vSigma{{\bm{\Sigma}}} -\def\evalpha{{\alpha}} -\def\evbeta{{\beta}} -\def\evepsilon{{\epsilon}} -\def\evlambda{{\lambda}} -\def\evomega{{\omega}} -\def\evmu{{\mu}} -\def\evpsi{{\psi}} -\def\evsigma{{\sigma}} -\def\evtheta{{\theta}} -\def\eva{{a}} -\def\evb{{b}} -\def\evc{{c}} -\def\evd{{d}} -\def\eve{{e}} -\def\evf{{f}} -\def\evg{{g}} -\def\evh{{h}} -\def\evi{{i}} -\def\evj{{j}} -\def\evk{{k}} -\def\evl{{l}} -\def\evm{{m}} -\def\evn{{n}} -\def\evo{{o}} -\def\evp{{p}} -\def\evq{{q}} -\def\evr{{r}} -\def\evs{{s}} -\def\evt{{t}} -\def\evu{{u}} -\def\evv{{v}} -\def\evw{{w}} -\def\evx{{x}} -\def\evy{{y}} -\def\evz{{z}} - -\def\mA{{\bm{A}}} -\def\mB{{\bm{B}}} -\def\mC{{\bm{C}}} -\def\mD{{\bm{D}}} -\def\mE{{\bm{E}}} -\def\mF{{\bm{F}}} -\def\mG{{\bm{G}}} -\def\mH{{\bm{H}}} -\def\mI{{\bm{I}}} -\def\mJ{{\bm{J}}} -\def\mK{{\bm{K}}} -\def\mL{{\bm{L}}} -\def\mM{{\bm{M}}} -\def\mN{{\bm{N}}} -\def\mO{{\bm{O}}} -\def\mP{{\bm{P}}} -\def\mQ{{\bm{Q}}} -\def\mR{{\bm{R}}} -\def\mS{{\bm{S}}} -\def\mT{{\bm{T}}} -\def\mU{{\bm{U}}} -\def\mV{{\bm{V}}} -\def\mW{{\bm{W}}} -\def\mX{{\bm{X}}} -\def\mY{{\bm{Y}}} -\def\mZ{{\bm{Z}}} -\def\E{{\mathcal{E}}} -\def\mBeta{{\bm{\beta}}} -\def\mTheta{{\bm{\theta}}} -\def\mPhi{{\bm{\Phi}}} -\def\mLambda{{\bm{\Lambda}}} -\def\mSigma{{\bm{\Sigma}}} - -\DeclareMathAlphabet{\mathsfit}{\encodingdefault}{\sfdefault}{m}{sl} -\SetMathAlphabet{\mathsfit}{bold}{\encodingdefault}{\sfdefault}{bx}{n} -\newcommand{\tens}[1]{\bm{\mathsfit{#1}}} -\def\tA{{\tens{A}}} -\def\tB{{\tens{B}}} -\def\tC{{\tens{C}}} -\def\tD{{\tens{D}}} -\def\tE{{\tens{E}}} -\def\tF{{\tens{F}}} -\def\tG{{\tens{G}}} -\def\tH{{\tens{H}}} -\def\tI{{\tens{I}}} -\def\tJ{{\tens{J}}} -\def\tK{{\tens{K}}} -\def\tL{{\tens{L}}} -\def\tM{{\tens{M}}} -\def\tN{{\tens{N}}} -\def\tO{{\tens{O}}} -\def\tP{{\tens{P}}} -\def\tQ{{\tens{Q}}} -\def\tR{{\tens{R}}} -\def\tS{{\tens{S}}} -\def\tT{{\tens{T}}} -\def\tU{{\tens{U}}} -\def\tV{{\tens{V}}} -\def\tW{{\tens{W}}} -\def\tX{{\tens{X}}} -\def\tY{{\tens{Y}}} -\def\tZ{{\tens{Z}}} - -\def\tx{{\tens{x}}} - - -\def\gA{{\mathcal{A}}} -\def\gB{{\mathcal{B}}} -\def\gC{{\mathcal{C}}} -\def\gD{{\mathcal{D}}} -\def\gE{{\mathcal{E}}} -\def\gF{{\mathcal{F}}} -\def\gG{{\mathcal{G}}} -\def\gGh{{\hat\gG}} -\def\gFh{{\hat\gF}} - - -\def\gH{{\mathcal{H}}} -\def\gI{{\mathcal{I}}} -\def\gJ{{\mathcal{J}}} -\def\gK{{\mathcal{K}}} -\def\gL{{\mathcal{L}}} -\def\gM{{\mathcal{M}}} -\def\gN{{\mathcal{N}}} -\def\gO{{\mathcal{O}}} -\def\gP{{\mathcal{P}}} -\def\gQ{{\mathcal{Q}}} -\def\gR{{\mathcal{R}}} -\def\gS{{\mathcal{S}}} -\def\gT{{\mathcal{T}}} -\def\gU{{\mathcal{U}}} -\def\gV{{\mathcal{V}}} -\def\gW{{\mathcal{W}}} -\def\gX{{\mathcal{X}}} -\def\gY{{\mathcal{Y}}} -\def\gZ{{\mathcal{Z}}} - -\def\sA{{\mathbb{A}}} -\def\sB{{\mathbb{B}}} -\def\sC{{\mathbb{C}}} -\def\sD{{\mathbb{D}}} -\def\sF{{\mathbb{F}}} -\def\sG{{\mathbb{G}}} -\def\sH{{\mathbb{H}}} -\def\sI{{\mathbb{I}}} -\def\sJ{{\mathbb{J}}} -\def\sK{{\mathbb{K}}} -\def\sL{{\mathbb{L}}} -\def\sM{{\mathbb{M}}} -\def\sN{{\mathbb{N}}} -\def\sO{{\mathbb{O}}} -\def\sP{{\mathbb{P}}} -\def\sQ{{\mathbb{Q}}} -\def\sR{{\mathbb{R}}} -\def\sS{{\mathbb{S}}} -\def\sU{{\mathbb{U}}} -\def\sV{{\mathbb{V}}} -\def\sW{{\mathbb{W}}} -\def\sX{{\mathcal{X}}} -\def\sY{{\mathcal{Y}}} -\def\sZ{{\mathcal{Z}}} -\def\sTheta{{\bm{\Theta}}} - -\def\emLambda{{\Lambda}} -\def\emA{{A}} -\def\emB{{B}} -\def\emC{{C}} -\def\emD{{D}} -\def\emE{{E}} -\def\emF{{F}} -\def\emG{{G}} -\def\emH{{H}} -\def\emI{{I}} -\def\emJ{{J}} -\def\emK{{K}} -\def\emL{{L}} -\def\emM{{M}} -\def\emN{{N}} -\def\emO{{O}} -\def\emP{{P}} -\def\emQ{{Q}} -\def\emR{{R}} -\def\emS{{S}} -\def\emT{{T}} -\def\emU{{U}} -\def\emV{{V}} -\def\emW{{W}} -\def\emX{{X}} -\def\emY{{Y}} -\def\emZ{{Z}} -\def\emSigma{{\Sigma}} - -\newcommand{\etens}[1]{\mathsfit{#1}} -\def\etLambda{{\etens{\Lambda}}} -\def\etA{{\etens{A}}} -\def\etB{{\etens{B}}} -\def\etC{{\etens{C}}} -\def\etD{{\etens{D}}} -\def\etE{{\etens{E}}} -\def\etF{{\etens{F}}} -\def\etG{{\etens{G}}} -\def\etH{{\etens{H}}} -\def\etI{{\etens{I}}} -\def\etJ{{\etens{J}}} -\def\etK{{\etens{K}}} -\def\etL{{\etens{L}}} -\def\etM{{\etens{M}}} -\def\etN{{\etens{N}}} -\def\etO{{\etens{O}}} -\def\etP{{\etens{P}}} -\def\etQ{{\etens{Q}}} -\def\etR{{\etens{R}}} -\def\etS{{\etens{S}}} -\def\etT{{\etens{T}}} -\def\etU{{\etens{U}}} -\def\etV{{\etens{V}}} -\def\etW{{\etens{W}}} -\def\etX{{\etens{X}}} -\def\etY{{\etens{Y}}} -\def\etZ{{\etens{Z}}} - -\newcommand{\pdata}{p_{\rm{data}}} -\newcommand{\ptrain}{\hat{p}_{\rm{data}}} -\newcommand{\Ptrain}{\hat{P}_{\rm{data}}} -\newcommand{\pmodel}{p_{\rm{model}}} -\newcommand{\Pmodel}{P_{\rm{model}}} -\newcommand{\ptildemodel}{\tilde{p}_{\rm{model}}} -\newcommand{\pencode}{p_{\rm{encoder}}} -\newcommand{\pdecode}{p_{\rm{decoder}}} -\newcommand{\precons}{p_{\rm{reconstruct}}} -\newcommand{\dd}{\mathrm{d}} - -\newcommand{\laplace}{\mathrm{Laplace}} % - -\newcommand{\KL}{$\mathrm{KL}$\@\xspace} -\newcommand{\Kl}{\mathrm{KL}} -\newcommand{\Esp}{\mathbb{E}} -\newcommand{\Ls}{\mathcal{L}} -\newcommand{\emp}{\tilde{p}} -\newcommand{\lr}{\alpha} -\newcommand{\reg}{\lambda} -\newcommand{\rect}{\mathrm{rectifier}} -\newcommand{\softmax}{\mathrm{softmax}} -\newcommand{\slerp}{\mathrm{slerp}} -\newcommand{\sigmoid}{\sigma} -\newcommand{\softplus}{\zeta} -\newcommand{\Var}{\mathrm{Var}} -\newcommand{\standarderror}{\mathrm{SE}} -\newcommand{\Cov}{\mathrm{Cov}} -\newcommand{\Span}{\mathrm{Span}} -\newcommand{\card}{\mathrm{card}} - - -\newcommand{\KLD}[2]{D_{\mathrm{KL}} \left( \left. \left. #1 \right|\right| #2 \right) } -\newcommand{\normlzero}{L^0} -\newcommand{\normlone}{L^1} -\newcommand{\normltwo}{L^2} -\newcommand{\normlp}{L^p} -\newcommand{\normmax}{L^\infty} - -\newcommand{\pihalf}{\frac{\pi}{2}} - - -\newcommand{\parents}{Pa} % - -\DeclareMathOperator*{\argmax}{argmax} -\DeclareMathOperator*{\argmin}{argmin} -\newcommand{\acc}{\mathrm{Acc}} -\newcommand{\1}{\mathds{1}} -\DeclareMathOperator{\sign}{sign} -\DeclareMathOperator{\Tr}{Tr} -\let\ab\allowbreak diff --git a/app/scripts/latex-to-mdx/input/natbib.sty b/app/scripts/latex-to-mdx/input/natbib.sty deleted file mode 100644 index ff0d0b91b6ef41468c593a0ca40a81f9a183b055..0000000000000000000000000000000000000000 --- a/app/scripts/latex-to-mdx/input/natbib.sty +++ /dev/null @@ -1,1246 +0,0 @@ -%% -%% This is file `natbib.sty', -%% generated with the docstrip utility. -%% -%% The original source files were: -%% -%% natbib.dtx (with options: `package,all') -%% ============================================= -%% IMPORTANT NOTICE: -%% -%% This program can be redistributed and/or modified under the terms -%% of the LaTeX Project Public License Distributed from CTAN -%% archives in directory macros/latex/base/lppl.txt; either -%% version 1 of the License, or any later version. -%% -%% This is a generated file. -%% It may not be distributed without the original source file natbib.dtx. -%% -%% Full documentation can be obtained by LaTeXing that original file. -%% Only a few abbreviated comments remain here to describe the usage. -%% ============================================= -%% Copyright 1993-2009 Patrick W Daly -%% Max-Planck-Institut f\"ur Sonnensystemforschung -%% Max-Planck-Str. 2 -%% D-37191 Katlenburg-Lindau -%% Germany -%% E-mail: daly@mps.mpg.de -\NeedsTeXFormat{LaTeX2e}[1995/06/01] -\ProvidesPackage{natbib} - [2009/07/16 8.31 (PWD, AO)] - - % This package reimplements the LaTeX \cite command to be used for various - % citation styles, both author-year and numerical. It accepts BibTeX - % output intended for many other packages, and therefore acts as a - % general, all-purpose citation-style interface. - % - % With standard numerical .bst files, only numerical citations are - % possible. With an author-year .bst file, both numerical and - % author-year citations are possible. - % - % If author-year citations are selected, \bibitem must have one of the - % following forms: - % \bibitem[Jones et al.(1990)]{key}... - % \bibitem[Jones et al.(1990)Jones, Baker, and Williams]{key}... - % \bibitem[Jones et al., 1990]{key}... - % \bibitem[\protect\citeauthoryear{Jones, Baker, and Williams}{Jones - % et al.}{1990}]{key}... - % \bibitem[\protect\citeauthoryear{Jones et al.}{1990}]{key}... - % \bibitem[\protect\astroncite{Jones et al.}{1990}]{key}... - % \bibitem[\protect\citename{Jones et al., }1990]{key}... - % \harvarditem[Jones et al.]{Jones, Baker, and Williams}{1990}{key}... - % - % This is either to be made up manually, or to be generated by an - % appropriate .bst file with BibTeX. - % Author-year mode || Numerical mode - % Then, \citet{key} ==>> Jones et al. (1990) || Jones et al. [21] - % \citep{key} ==>> (Jones et al., 1990) || [21] - % Multiple citations as normal: - % \citep{key1,key2} ==>> (Jones et al., 1990; Smith, 1989) || [21,24] - % or (Jones et al., 1990, 1991) || [21,24] - % or (Jones et al., 1990a,b) || [21,24] - % \cite{key} is the equivalent of \citet{key} in author-year mode - % and of \citep{key} in numerical mode - % Full author lists may be forced with \citet* or \citep*, e.g. - % \citep*{key} ==>> (Jones, Baker, and Williams, 1990) - % Optional notes as: - % \citep[chap. 2]{key} ==>> (Jones et al., 1990, chap. 2) - % \citep[e.g.,][]{key} ==>> (e.g., Jones et al., 1990) - % \citep[see][pg. 34]{key}==>> (see Jones et al., 1990, pg. 34) - % (Note: in standard LaTeX, only one note is allowed, after the ref. - % Here, one note is like the standard, two make pre- and post-notes.) - % \citealt{key} ==>> Jones et al. 1990 - % \citealt*{key} ==>> Jones, Baker, and Williams 1990 - % \citealp{key} ==>> Jones et al., 1990 - % \citealp*{key} ==>> Jones, Baker, and Williams, 1990 - % Additional citation possibilities (both author-year and numerical modes) - % \citeauthor{key} ==>> Jones et al. - % \citeauthor*{key} ==>> Jones, Baker, and Williams - % \citeyear{key} ==>> 1990 - % \citeyearpar{key} ==>> (1990) - % \citetext{priv. comm.} ==>> (priv. comm.) - % \citenum{key} ==>> 11 [non-superscripted] - % Note: full author lists depends on whether the bib style supports them; - % if not, the abbreviated list is printed even when full requested. - % - % For names like della Robbia at the start of a sentence, use - % \Citet{dRob98} ==>> Della Robbia (1998) - % \Citep{dRob98} ==>> (Della Robbia, 1998) - % \Citeauthor{dRob98} ==>> Della Robbia - % - % - % Citation aliasing is achieved with - % \defcitealias{key}{text} - % \citetalias{key} ==>> text - % \citepalias{key} ==>> (text) - % - % Defining the citation mode and punctual (citation style) - % \setcitestyle{} - % Example: \setcitestyle{square,semicolon} - % Alternatively: - % Use \bibpunct with 6 mandatory arguments: - % 1. opening bracket for citation - % 2. closing bracket - % 3. citation separator (for multiple citations in one \cite) - % 4. the letter n for numerical styles, s for superscripts - % else anything for author-year - % 5. punctuation between authors and date - % 6. punctuation between years (or numbers) when common authors missing - % One optional argument is the character coming before post-notes. It - % appears in square braces before all other arguments. May be left off. - % Example (and default) \bibpunct[, ]{(}{)}{;}{a}{,}{,} - % - % To make this automatic for a given bib style, named newbib, say, make - % a local configuration file, natbib.cfg, with the definition - % \newcommand{\bibstyle@newbib}{\bibpunct...} - % Then the \bibliographystyle{newbib} will cause \bibstyle@newbib to - % be called on THE NEXT LATEX RUN (via the aux file). - % - % Such preprogrammed definitions may be invoked anywhere in the text - % by calling \citestyle{newbib}. This is only useful if the style specified - % differs from that in \bibliographystyle. - % - % With \citeindextrue and \citeindexfalse, one can control whether the - % \cite commands make an automatic entry of the citation in the .idx - % indexing file. For this, \makeindex must also be given in the preamble. - % - % Package Options: (for selecting punctuation) - % round - round parentheses are used (default) - % square - square brackets are used [option] - % curly - curly braces are used {option} - % angle - angle brackets are used