diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000000000000000000000000000000000000..5837b2b57b8d319f7a12c1b0ff413044b7792f33
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,118 @@
+# Changelog
+
+All notable changes to the Research Article Template will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+- Initial open source release
+- Comprehensive documentation
+- Contributing guidelines
+- License file
+
+## [1.0.0] - 2024-12-19
+
+### Added
+- **Core Features**:
+  - Markdown/MDX-based writing system
+  - KaTeX mathematical notation support
+  - Syntax highlighting for code blocks
+  - Academic citations with BibTeX integration
+  - Footnotes and sidenotes system
+  - Auto-generated table of contents
+  - Interactive Mermaid diagrams
+  - Plotly.js and D3.js integration
+  - HTML embed support
+  - Gradio app embedding
+  - Dataviz color palettes
+  - Image optimization
+  - SEO-friendly structure
+  - Automatic PDF export
+  - Dark/light theme toggle
+  - Mobile-responsive design
+  - LaTeX import functionality
+  - Template synchronization system
+
+- **Components**:
+  - Figure component with captions
+  - MultiFigure for image galleries
+  - Note component with variants
+  - Quote component
+  - Accordion for collapsible content
+  - Sidenote component
+  - Table of Contents
+  - Theme Toggle
+  - HTML Embed
+  - Raw HTML support
+  - SEO component
+  - Hero section
+  - Footer
+  - Full-width and wide layouts
+
+- **Build System**:
+  - Astro 4.10.0 integration
+  - PostCSS with custom media queries
+  - Automatic compression
+  - Docker support
+  - Nginx configuration
+  - Git LFS support
+
+- **Scripts**:
+  - PDF export functionality
+  - LaTeX to MDX conversion
+  - Template synchronization
+  - Font SVG generation
+  - TrackIO data generation
+
+- **Documentation**:
+  - Getting started guide
+  - Writing best practices
+  - Component reference
+  - LaTeX conversion guide
+  - Interactive examples
+
+### Technical Details
+- **Framework**: Astro 4.10.0
+- **Styling**: PostCSS with custom properties
+- **Math**: KaTeX 0.16.22
+- **Charts**: Plotly.js 3.1.0, D3.js 7.9.0
+- **Diagrams**: Mermaid 11.10.1
+- **Node.js**: >=20.0.0
+- **License**: CC-BY-4.0
+
+### Browser Support
+- Chrome (latest)
+- Firefox (latest)
+- Safari (latest)
+- Edge (latest)
+
+---
+
+## Version History
+
+- **1.0.0**: Initial stable release with full feature set
+- **0.0.1**: Development version (pre-release)
+
+## Migration Guide
+
+### From 0.0.1 to 1.0.0
+
+This is the first stable release. No breaking changes from the development version.
+
+### Updating Your Project
+
+Use the template synchronization system to update:
+
+```bash
+npm run sync:template -- --dry-run  # Preview changes
+npm run sync:template               # Apply updates
+```
+
+## Support
+
+- **Documentation**: [Hugging Face Space](https://huggingface.co/spaces/tfrere/research-article-template)
+- **Issues**: [Community Discussions](https://huggingface.co/spaces/tfrere/research-article-template/discussions)
+- **Contact**: [@tfrere](https://huggingface.co/tfrere)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000000000000000000000000000000000000..a4573b5d9abcd9e9ba35095677d0443b157298ec
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,196 @@
+# Contributing to Research Article Template
+
+Thank you for your interest in contributing to the Research Article Template! This document provides guidelines and information for contributors.
+
+## 🤝 How to Contribute
+
+### Reporting Issues
+
+Before creating an issue, please:
+1. **Search existing issues** to avoid duplicates
+2. **Use the issue template** when available
+3. **Provide detailed information**:
+   - Clear description of the problem
+   - Steps to reproduce
+   - Expected vs actual behavior
+   - Environment details (OS, Node.js version, browser)
+   - Screenshots if applicable
+
+### Suggesting Features
+
+We welcome feature suggestions! Please:
+1. **Check existing discussions** first
+2. **Describe the use case** clearly
+3. **Explain the benefits** for the community
+4. **Consider implementation complexity**
+
+### Code Contributions
+
+#### Getting Started
+
+1. **Fork the repository** on Hugging Face
+2. **Clone your fork**:
+   ```bash
+   git clone git@hf.co:spaces/<your-username>/research-article-template
+   cd research-article-template
+   ```
+3. **Install dependencies**:
+   ```bash
+   cd app
+   npm install
+   ```
+4. **Create a feature branch**:
+   ```bash
+   git checkout -b feature/your-feature-name
+   ```
+
+#### Development Workflow
+
+1. **Make your changes** following our coding standards
+2. **Test thoroughly**:
+   ```bash
+   npm run dev    # Test locally
+   npm run build  # Ensure build works
+   ```
+3. **Update documentation** if needed
+4. **Commit with clear messages**:
+   ```bash
+   git commit -m "feat: add new component for interactive charts"
+   ```
+
+#### Pull Request Process
+
+1. **Push your branch**:
+   ```bash
+   git push origin feature/your-feature-name
+   ```
+2. **Create a Pull Request** with:
+   - Clear title and description
+   - Reference related issues
+   - Screenshots for UI changes
+   - Testing instructions
+
+## 📋 Coding Standards
+
+### Code Style
+
+- **Use Prettier** for consistent formatting
+- **Follow existing patterns** in the codebase
+- **Write clear, self-documenting code**
+- **Add comments** for complex logic
+- **Use meaningful variable names**
+
+### File Organization
+
+- **Components**: Place in `src/components/`
+- **Styles**: Use CSS modules or component-scoped styles
+- **Assets**: Organize in `src/content/assets/`
+- **Documentation**: Update relevant `.mdx` files
+
+### Commit Message Format
+
+We follow [Conventional Commits](https://www.conventionalcommits.org/):
+
+```
+type(scope): description
+
+feat: add new interactive chart component
+fix: resolve mobile layout issues
+docs: update installation instructions
+style: improve button hover states
+refactor: simplify component structure
+test: add unit tests for utility functions
+```
+
+**Types**: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`
+
+## 🧪 Testing
+
+### Manual Testing
+
+Before submitting:
+- [ ] Test on different screen sizes
+- [ ] Verify dark/light theme compatibility
+- [ ] Check browser compatibility (Chrome, Firefox, Safari)
+- [ ] Test with different content types
+- [ ] Ensure accessibility standards
+
+### Automated Testing
+
+```bash
+# Run build to catch errors
+npm run build
+
+# Test PDF export
+npm run export:pdf
+
+# Test LaTeX conversion
+npm run latex:convert
+```
+
+## 📚 Documentation
+
+### Writing Guidelines
+
+- **Use clear, concise language**
+- **Provide examples** for complex features
+- **Include screenshots** for UI changes
+- **Update both English content and code comments**
+
+### Documentation Structure
+
+- **README.md**: Project overview and quick start
+- **CONTRIBUTING.md**: This file
+- **Content files**: In `src/content/chapters/demo/`
+- **Component docs**: Inline comments and examples
+
+## 🎯 Areas for Contribution
+
+### High Priority
+
+- **Bug fixes** and stability improvements
+- **Accessibility enhancements**
+- **Mobile responsiveness**
+- **Performance optimizations**
+- **Documentation improvements**
+
+### Feature Ideas
+
+- **New interactive components**
+- **Additional export formats**
+- **Enhanced LaTeX import**
+- **Theme customization**
+- **Plugin system**
+
+### Community
+
+- **Answer questions** in discussions
+- **Share examples** of your work
+- **Write tutorials** and guides
+- **Help with translations**
+
+## 🚫 What Not to Contribute
+
+- **Breaking changes** without discussion
+- **Major architectural changes** without approval
+- **Dependencies** that significantly increase bundle size
+- **Features** that don't align with the project's goals
+
+## 📞 Getting Help
+
+- **Discussions**: [Community tab](https://huggingface.co/spaces/tfrere/research-article-template/discussions)
+- **Issues**: [Report bugs](https://huggingface.co/spaces/tfrere/research-article-template/discussions?status=open&type=issue)
+- **Contact**: [@tfrere](https://huggingface.co/tfrere) on Hugging Face
+
+## 📄 License
+
+By contributing, you agree that your contributions will be licensed under the same [CC-BY-4.0 license](LICENSE) that covers the project.
+
+## 🙏 Recognition
+
+Contributors will be:
+- **Listed in acknowledgments** (if desired)
+- **Mentioned in release notes** for significant contributions
+- **Credited** in relevant documentation
+
+Thank you for helping make scientific writing more accessible and interactive! 🎉
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..b267a53137822114e4c0bcef2e6383aaf52a70f1
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,33 @@
+Creative Commons Attribution 4.0 International License
+
+Copyright (c) 2024 Thibaud Frere
+
+This work is licensed under the Creative Commons Attribution 4.0 International License.
+To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
+or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
+
+You are free to:
+
+    Share — copy and redistribute the material in any medium or format
+    Adapt — remix, transform, and build upon the material for any purpose, even commercially.
+
+The licensor cannot revoke these freedoms as long as you follow the license terms.
+
+Under the following terms:
+
+    Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
+
+    No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
+
+Notices:
+
+    You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
+
+    No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
+
+---
+
+For the source code and technical implementation:
+- The source code is available at: https://huggingface.co/spaces/tfrere/research-article-template
+- Third-party figures and assets are excluded from this license and marked in their captions
+- Dependencies and third-party libraries maintain their respective licenses
diff --git a/README.md b/README.md
index 3301c23cf8488bf55409e058c7d7e9de797cedab..114b903c9da3bc87749b1260eaa2eb272914fe92 100644
--- a/README.md
+++ b/README.md
@@ -8,4 +8,132 @@ pinned: false
 header: mini
 app_port: 8080
 thumbnail: https://huggingface.co/spaces/tfrere/research-paper-template/thumb.jpg
----
\ No newline at end of file
+---
+
+# 📝 Research Article Template
+
+[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
+[![Node.js Version](https://img.shields.io/badge/node-%3E%3D20.0.0-brightgreen.svg)](https://nodejs.org/)
+[![Astro](https://img.shields.io/badge/Astro-4.10.0-orange.svg)](https://astro.build/)
+[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/tfrere/research-article-template)
+
+> **A modern, interactive template for scientific writing** that brings papers to life with web-native features, minimal setup, and maximum impact.
+
+## ✨ Features
+
+- 🎯 **Markdown-based** - Write in familiar Markdown/MDX
+- 🧮 **KaTeX math** - Beautiful mathematical notation
+- 🎨 **Syntax highlighting** - Code blocks with proper highlighting
+- 📚 **Academic citations** - BibTeX integration
+- 📝 **Footnotes & sidenotes** - Rich annotation system
+- 📋 **Table of contents** - Auto-generated navigation
+- 📊 **Interactive diagrams** - Mermaid, Plotly, D3.js ready
+- 🎭 **HTML embeds** - Include any web content
+- 🤖 **Gradio app embeds** - Interactive ML demos
+- 🎨 **Dataviz color palettes** - Consistent visual design
+- 🖼️ **Optimized images** - Automatic optimization
+- ⚡ **Lightweight bundle** - Fast loading
+- 🔍 **SEO friendly** - Search engine optimized
+- 🏗️ **Automatic build** - CI/CD ready
+- 📄 **PDF export** - Generate publication-ready PDFs
+- 🌙 **Dark theme** - Modern UI with theme toggle
+- 📱 **Mobile friendly** - Responsive design
+- 📥 **LaTeX import** - Convert existing papers
+- 🔄 **Template updates** - Stay current with improvements
+
+## 🚀 Quick Start
+
+### Option 1: Duplicate on Hugging Face (Recommended)
+
+1. Visit **[🤗 Research Article Template](https://huggingface.co/spaces/tfrere/research-article-template)**
+2. Click **"Duplicate this Space"**
+3. Clone your new repository:
+   ```bash
+   git clone git@hf.co:spaces/<your-username>/<your-space>
+   cd <your-space>
+   ```
+
+### Option 2: Clone Directly
+
+```bash
+git clone https://github.com/tfrere/research-article-template.git
+cd research-article-template
+```
+
+### Installation
+
+```bash
+# Install Node.js 20+ (use nvm for version management)
+nvm install 20
+nvm use 20
+
+# Install Git LFS and pull assets
+git lfs install
+git lfs pull
+
+# Install dependencies
+cd app
+npm install
+
+# Start development server
+npm run dev
+```
+
+Visit `http://localhost:4321` to see your site!
+
+## 📖 Documentation
+
+- **[Getting Started Guide](https://huggingface.co/spaces/tfrere/research-article-template)** - Complete setup instructions
+- **[Writing Best Practices](https://huggingface.co/spaces/tfrere/research-article-template)** - Tips for effective scientific writing
+- **[Component Reference](https://huggingface.co/spaces/tfrere/research-article-template)** - Available blocks and features
+- **[LaTeX Conversion](https://huggingface.co/spaces/tfrere/research-article-template)** - Import existing papers
+
+## 🎯 Who This Is For
+
+- **Scientists** writing modern, web-native research papers
+- **Educators** creating interactive, explorable lessons
+- **Researchers** who want to focus on ideas, not infrastructure
+- **Anyone** who values clear, engaging technical communication
+
+## 🌟 Inspired by Distill
+
+This template carries forward the spirit of [Distill](https://distill.pub/) (2016–2021), pushing interactive scientific writing even further with:
+- Accessible, high-quality explanations
+- Reproducible, production-ready demos
+- Modern web technologies and best practices
+
+## 🤝 Contributing
+
+We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
+
+### Ways to Contribute
+
+- 🐛 **Report bugs** - Open an issue with detailed information
+- 💡 **Suggest features** - Share ideas for improvements
+- 📝 **Improve documentation** - Help others get started
+- 🔧 **Submit code** - Fix bugs or add features
+- 💬 **Join discussions** - Share feedback and ideas
+
+## 📄 License
+
+This project is licensed under the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).
+
+- **Diagrams and text**: CC-BY 4.0
+- **Source code**: Available on [Hugging Face](https://huggingface.co/spaces/tfrere/research-article-template)
+- **Third-party figures**: Excluded and marked in captions
+
+## 🙏 Acknowledgments
+
+- Inspired by [Distill](https://distill.pub/) and the interactive scientific writing movement
+- Built with [Astro](https://astro.build/), [MDX](https://mdxjs.com/), and modern web technologies
+- Community feedback and contributions from researchers worldwide
+
+## 📞 Support
+
+- 💬 **[Community Discussions](https://huggingface.co/spaces/tfrere/research-article-template/discussions)** - Ask questions and share ideas
+- 🐛 **[Report Issues](https://huggingface.co/spaces/tfrere/research-article-template/discussions?status=open&type=issue)** - Bug reports and feature requests
+- 📧 **Contact**: [@tfrere](https://huggingface.co/tfrere) on Hugging Face
+
+---
+
+**Made with ❤️ for the scientific community**
\ No newline at end of file
diff --git a/app/.astro/astro/content.d.ts b/app/.astro/astro/content.d.ts
index eb236b062e47ff762326764dbd53546131697d54..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 100644
--- a/app/.astro/astro/content.d.ts
+++ b/app/.astro/astro/content.d.ts
@@ -1,284 +0,0 @@
-declare module 'astro:content' {
-	interface Render {
-		'.mdx': Promise<{
-			Content: import('astro').MarkdownInstance<{}>['Content'];
-			headings: import('astro').MarkdownHeading[];
-			remarkPluginFrontmatter: Record<string, any>;
-			components: import('astro').MDXInstance<{}>['components'];
-		}>;
-	}
-}
-
-declare module 'astro:content' {
-	interface RenderResult {
-		Content: import('astro/runtime/server/index.js').AstroComponentFactory;
-		headings: import('astro').MarkdownHeading[];
-		remarkPluginFrontmatter: Record<string, any>;
-	}
-	interface Render {
-		'.md': Promise<RenderResult>;
-	}
-
-	export interface RenderedContent {
-		html: string;
-		metadata?: {
-			imagePaths: Array<string>;
-			[key: string]: unknown;
-		};
-	}
-}
-
-declare module 'astro:content' {
-	type Flatten<T> = T extends { [K: string]: infer U } ? U : never;
-
-	export type CollectionKey = keyof AnyEntryMap;
-	export type CollectionEntry<C extends CollectionKey> = Flatten<AnyEntryMap[C]>;
-
-	export type ContentCollectionKey = keyof ContentEntryMap;
-	export type DataCollectionKey = keyof DataEntryMap;
-
-	type AllValuesOf<T> = T extends any ? T[keyof T] : never;
-	type ValidContentEntrySlug<C extends keyof ContentEntryMap> = AllValuesOf<
-		ContentEntryMap[C]
-	>['slug'];
-
-	/** @deprecated Use `getEntry` instead. */
-	export function getEntryBySlug<
-		C extends keyof ContentEntryMap,
-		E extends ValidContentEntrySlug<C> | (string & {}),
-	>(
-		collection: C,
-		// Note that this has to accept a regular string too, for SSR
-		entrySlug: E,
-	): E extends ValidContentEntrySlug<C>
-		? Promise<CollectionEntry<C>>
-		: Promise<CollectionEntry<C> | undefined>;
-
-	/** @deprecated Use `getEntry` instead. */
-	export function getDataEntryById<C extends keyof DataEntryMap, E extends keyof DataEntryMap[C]>(
-		collection: C,
-		entryId: E,
-	): Promise<CollectionEntry<C>>;
-
-	export function getCollection<C extends keyof AnyEntryMap, E extends CollectionEntry<C>>(
-		collection: C,
-		filter?: (entry: CollectionEntry<C>) => entry is E,
-	): Promise<E[]>;
-	export function getCollection<C extends keyof AnyEntryMap>(
-		collection: C,
-		filter?: (entry: CollectionEntry<C>) => unknown,
-	): Promise<CollectionEntry<C>[]>;
-
-	export function getEntry<
-		C extends keyof ContentEntryMap,
-		E extends ValidContentEntrySlug<C> | (string & {}),
-	>(entry: {
-		collection: C;
-		slug: E;
-	}): E extends ValidContentEntrySlug<C>
-		? Promise<CollectionEntry<C>>
-		: Promise<CollectionEntry<C> | undefined>;
-	export function getEntry<
-		C extends keyof DataEntryMap,
-		E extends keyof DataEntryMap[C] | (string & {}),
-	>(entry: {
-		collection: C;
-		id: E;
-	}): E extends keyof DataEntryMap[C]
-		? Promise<DataEntryMap[C][E]>
-		: Promise<CollectionEntry<C> | undefined>;
-	export function getEntry<
-		C extends keyof ContentEntryMap,
-		E extends ValidContentEntrySlug<C> | (string & {}),
-	>(
-		collection: C,
-		slug: E,
-	): E extends ValidContentEntrySlug<C>
-		? Promise<CollectionEntry<C>>
-		: Promise<CollectionEntry<C> | undefined>;
-	export function getEntry<
-		C extends keyof DataEntryMap,
-		E extends keyof DataEntryMap[C] | (string & {}),
-	>(
-		collection: C,
-		id: E,
-	): E extends keyof DataEntryMap[C]
-		? Promise<DataEntryMap[C][E]>
-		: Promise<CollectionEntry<C> | undefined>;
-
-	/** Resolve an array of entry references from the same collection */
-	export function getEntries<C extends keyof ContentEntryMap>(
-		entries: {
-			collection: C;
-			slug: ValidContentEntrySlug<C>;
-		}[],
-	): Promise<CollectionEntry<C>[]>;
-	export function getEntries<C extends keyof DataEntryMap>(
-		entries: {
-			collection: C;
-			id: keyof DataEntryMap[C];
-		}[],
-	): Promise<CollectionEntry<C>[]>;
-
-	export function render<C extends keyof AnyEntryMap>(
-		entry: AnyEntryMap[C][string],
-	): Promise<RenderResult>;
-
-	export function reference<C extends keyof AnyEntryMap>(
-		collection: C,
-	): import('astro/zod').ZodEffects<
-		import('astro/zod').ZodString,
-		C extends keyof ContentEntryMap
-			? {
-					collection: C;
-					slug: ValidContentEntrySlug<C>;
-				}
-			: {
-					collection: C;
-					id: keyof DataEntryMap[C];
-				}
-	>;
-	// Allow generic `string` to avoid excessive type errors in the config
-	// if `dev` is not running to update as you edit.
-	// Invalid collection names will be caught at build time.
-	export function reference<C extends string>(
-		collection: C,
-	): import('astro/zod').ZodEffects<import('astro/zod').ZodString, never>;
-
-	type ReturnTypeOrOriginal<T> = T extends (...args: any[]) => infer R ? R : T;
-	type InferEntrySchema<C extends keyof AnyEntryMap> = import('astro/zod').infer<
-		ReturnTypeOrOriginal<Required<ContentConfig['collections'][C]>['schema']>
-	>;
-
-	type ContentEntryMap = {
-		"chapters": {
-"demo/best-pratices.mdx": {
-	id: "demo/best-pratices.mdx";
-  slug: "demo/best-pratices";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/components.mdx": {
-	id: "demo/components.mdx";
-  slug: "demo/components";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/debug-components.mdx": {
-	id: "demo/debug-components.mdx";
-  slug: "demo/debug-components";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/getting-started.mdx": {
-	id: "demo/getting-started.mdx";
-  slug: "demo/getting-started";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/greetings.mdx": {
-	id: "demo/greetings.mdx";
-  slug: "demo/greetings";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/introduction.mdx": {
-	id: "demo/introduction.mdx";
-  slug: "demo/introduction";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/latex-convertion.mdx": {
-	id: "demo/latex-convertion.mdx";
-  slug: "demo/latex-convertion";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/markdown.mdx": {
-	id: "demo/markdown.mdx";
-  slug: "demo/markdown";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/vibe-coding-charts.mdx": {
-	id: "demo/vibe-coding-charts.mdx";
-  slug: "demo/vibe-coding-charts";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"demo/writing-your-content.mdx": {
-	id: "demo/writing-your-content.mdx";
-  slug: "demo/writing-your-content";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-"your-first-chapter.mdx": {
-	id: "your-first-chapter.mdx";
-  slug: "your-first-chapter";
-  body: string;
-  collection: "chapters";
-  data: any
-} & { render(): Render[".mdx"] };
-};
-"embeds": {
-"vibe-code-d3-embeds-directives.md": {
-	id: "vibe-code-d3-embeds-directives.md";
-  slug: "vibe-code-d3-embeds-directives";
-  body: string;
-  collection: "embeds";
-  data: any
-} & { render(): Render[".md"] };
-};
-
-	};
-
-	type DataEntryMap = {
-		"assets": {
-"data/data": {
-	id: "data/data";
-  collection: "assets";
-  data: any
-};
-"data/font-sprite-mapping": {
-	id: "data/font-sprite-mapping";
-  collection: "assets";
-  data: any
-};
-"data/font_manifest": {
-	id: "data/font_manifest";
-  collection: "assets";
-  data: any
-};
-"data/llm_benchmarks": {
-	id: "data/llm_benchmarks";
-  collection: "assets";
-  data: any
-};
-"data/mnist-variant-model": {
-	id: "data/mnist-variant-model";
-  collection: "assets";
-  data: any
-};
-"data/typography_data": {
-	id: "data/typography_data";
-  collection: "assets";
-  data: any
-};
-};
-
-	};
-
-	type AnyEntryMap = ContentEntryMap & DataEntryMap;
-
-	export type ContentConfig = never;
-}
diff --git a/app/package.json b/app/package.json
index 660e1a654be5ca1a45138240dd8f8851f726986b..df93f49e7bdf92671c00235239e31e14c6b9fb70 100644
Binary files a/app/package.json and b/app/package.json differ
diff --git a/app/scripts/latex-to-mdx/input/.gitignore b/app/scripts/latex-to-mdx/input/.gitignore
deleted file mode 100644
index 3985e18491a1c8bd8442b52c71648788e52af71e..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/.gitignore
+++ /dev/null
@@ -1,13 +0,0 @@
-.DS_store
-
-*.aux
-*.nav
-*.log
-*.snm
-*.toc
-*.out
-*.vrb
-*.blg
-*latexmk*
-*fls
-*synctex*
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/README.md b/app/scripts/latex-to-mdx/input/README.md
deleted file mode 100644
index 060c311b294f3eeeed47d7b564e02cb38ad781d5..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/README.md
+++ /dev/null
@@ -1,64 +0,0 @@
-# Robot Learning: A Tutorial
-
-Google "robot learning tutorial", and you will spend just as much time skimming through sources as actually learning about robot learning.
-This tutorial solves this: a unified entry point to the field of robot learning, presenting the conceptual underpinnings of popular approaches in the field, as well as presenting practical examples of how to use SOTA algorithms in `lerobot`, an open-source library for full-stack robotics.
-
-# TODO
-
-```markdown
-## 1. Introduction
-- [x] 1.1 Motivation
-- [x] 1.2 Structure of the Report
-
-## 2. Classical Robotics
-- [x] 2.1 Different kinds of motion
-- [x] 2.2 Example: (Planar) Manipulation
-    - [x] 2.3.1 Adding Feedback Loops
-- [x] 2.4 Limitations of Dynamics-based Robotics
-
-## 3. Robot Learning
-- [ ] 3.1 Reinforcement Learning (RL) for Robotics
-    - [ ] 3.1.1 A (Concise) Introduction to RL
-- [ ] 3.2 Model-Free RL for Real-world Robotics
-    - [ ] 3.2.1 RL in lerobot: sample efficient, data-driven, and real-world
-    - [ ] 3.2.2 Code Example: HIL-SERL in lerobot
-- [ ] 3.3 Limitations of RL in Real-World Robotics: Simulators and Reward Design
-- [ ] 3.4 Behavioral Cloning (BC) for Robotics
-    - [ ] 4.1.1 Leveraging Real-World Demonstrations
-    - [ ] 4.1.2 Reward-Free Training and Betting on Data
-
-## 4. Single-Task Policy Architectures
-- [ ] 4.2 Action Chunking with Transformers (ACT)
-    - [ ] 4.2.1 Model Architecture and Training Objectives
-    - [ ] 4.2.2 Code Example: Use ACT in lerobot
-- [ ] 4.3 Diffusion-Based Policy Models
-    - [ ] 4.3.1 Generative Modeling for Action Sequences
-    - [ ] 4.3.2 Code Example: Use Diffusion Policy in lerobot
-
-## 5. Multi-task Policies: Vision-Language-Action (VLA) Models in Robotics
-- [ ] 5.1 Multi-task Policies: Vision-Language-Action (VLA) Models in Robotics
-    - [ ] 5.1.1 Overview of Major Architectures: Pi0, SmolVLA
-    - [ ] 5.1.2 Practical Implementation: Using VLA in lerobot
-
-## 6. Some Emerging Directions in Robot Learning
-- [ ] 6.1 VLAs Post-Training
-    - [ ] 6.1.1 From Imitation to Refinement
-    - [ ] 6.1.2 EXPO
-
-## 7. Conclusions
-```
-
-If time permits (vs current TOC):
-
-- [ ] 3.3 Model-based RL for Robotics
-    - [ ] 3.3.1 TD-MPC
-    - [ ] 3.3.2 Code Example: Use TD-MPC in lerobot
-- [ ] 3.5 Popular benchmarks in Robot Learning
-
-- 4.3 Vector-Quantized Behavior Transformer (VQ-BeT)
-    - [ ] 4.3.1 Model Architecture and Training Objectives
-    - [ ] 4.3.2 Code Example: Use VQ-BeT in lerobot
-
-- [ ] 6.1 Using World Models for Robotics
-    - [ ] 6.1.1 In the architecture: V-JEPA and V-JEPA2
-    - [ ] 6.1.2 In the simulation: GENIE
diff --git a/app/scripts/latex-to-mdx/input/_minted/62B8750C0ACEBDA39A95140434E540A8.highlight.minted b/app/scripts/latex-to-mdx/input/_minted/62B8750C0ACEBDA39A95140434E540A8.highlight.minted
deleted file mode 100644
index 3a28be3ec2ed0ab0e1783d7462c479ab9c7f9950..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/_minted/62B8750C0ACEBDA39A95140434E540A8.highlight.minted
+++ /dev/null
@@ -1,52 +0,0 @@
-\begin{MintedVerbatim}[commandchars=\\\{\}]
-\PYG{k+kn}{import}\PYG{+w}{ }\PYG{n+nn}{torch}
-\PYG{k+kn}{from}\PYG{+w}{ }\PYG{n+nn}{lerobot}\PYG{n+nn}{.}\PYG{n+nn}{datasets}\PYG{n+nn}{.}\PYG{n+nn}{lerobot\PYGZus{}dataset}\PYG{+w}{ }\PYG{k+kn}{import} \PYG{n}{LeRobotDataset}
-\PYG{k+kn}{from}\PYG{+w}{ }\PYG{n+nn}{lerobot}\PYG{n+nn}{.}\PYG{n+nn}{datasets}\PYG{n+nn}{.}\PYG{n+nn}{streaming\PYGZus{}dataset}\PYG{+w}{ }\PYG{k+kn}{import} \PYG{n}{StreamingLeRobotDataset}
-
-\PYG{n}{delta\PYGZus{}timestamps} \PYG{o}{=} \PYG{p}{\PYGZob{}}
-    \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{observation.images.wrist\PYGZus{}camera}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{:} \PYG{p}{[}\PYG{o}{\PYGZhy{}}\PYG{l+m+mf}{0.2}\PYG{p}{,} \PYG{o}{\PYGZhy{}}\PYG{l+m+mf}{0.1}\PYG{p}{,} \PYG{l+m+mf}{0.0}\PYG{p}{]}  \PYG{c+c1}{\PYGZsh{} 0.2, and 0.1 seconds *before* each frame}
-\PYG{p}{\PYGZcb{}}
-
-\PYG{c+c1}{\PYGZsh{} Optionally, use StreamingLeRobotDataset to avoid downloading the dataset}
-\PYG{n}{dataset} \PYG{o}{=} \PYG{n}{LeRobotDataset}\PYG{p}{(}
-    \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{lerobot/svla\PYGZus{}so101\PYGZus{}pickplace}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{,}
-    \PYG{n}{delta\PYGZus{}timestamps}\PYG{o}{=}\PYG{n}{delta\PYGZus{}timestamps}
-\PYG{p}{)}
-
-\PYG{c+c1}{\PYGZsh{} Streams frames from the Hugging Face Hub without loading into memory}
-\PYG{n}{streaming\PYGZus{}dataset} \PYG{o}{=} \PYG{n}{StreamingLeRobotDataset}\PYG{p}{(}
-    \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{lerobot/svla\PYGZus{}so101\PYGZus{}pickplace}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{,}
-    \PYG{n}{delta\PYGZus{}timestamps}\PYG{o}{=}\PYG{n}{delta\PYGZus{}timestamps}
-\PYG{p}{)}
-
-\PYG{c+c1}{\PYGZsh{} Get the 100th frame in the dataset by }
-\PYG{n}{sample} \PYG{o}{=} \PYG{n}{dataset}\PYG{p}{[}\PYG{l+m+mi}{100}\PYG{p}{]}
-\PYG{n+nb}{print}\PYG{p}{(}\PYG{n}{sample}\PYG{p}{)}
-\PYG{c+c1}{\PYGZsh{} \PYGZob{}}
-\PYG{c+c1}{\PYGZsh{} \PYGZsq{}observation.state\PYGZsq{}: tensor([...]), }
-\PYG{c+c1}{\PYGZsh{} \PYGZsq{}action\PYGZsq{}: tensor([...]), }
-\PYG{c+c1}{\PYGZsh{} \PYGZsq{}observation.images.wrist\PYGZus{}camera\PYGZsq{}: tensor([3, C, H, W]), for delta timesteps}
-\PYG{c+c1}{\PYGZsh{} ...}
-\PYG{c+c1}{\PYGZsh{} \PYGZcb{}}
-
-\PYG{n}{batch\PYGZus{}size}\PYG{o}{=}\PYG{l+m+mi}{16}
-\PYG{c+c1}{\PYGZsh{} wrap the dataset in a DataLoader to use process it batches for training purposes}
-\PYG{n}{data\PYGZus{}loader} \PYG{o}{=} \PYG{n}{torch}\PYG{o}{.}\PYG{n}{utils}\PYG{o}{.}\PYG{n}{data}\PYG{o}{.}\PYG{n}{DataLoader}\PYG{p}{(}
-    \PYG{n}{dataset}\PYG{p}{,}
-    \PYG{n}{batch\PYGZus{}size}\PYG{o}{=}\PYG{n}{batch\PYGZus{}size}
-\PYG{p}{)}
-
-\PYG{c+c1}{\PYGZsh{} Iterate over the DataLoader in a training loop}
-\PYG{n}{num\PYGZus{}epochs} \PYG{o}{=} \PYG{l+m+mi}{1}
-\PYG{n}{device} \PYG{o}{=} \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{cuda}\PYG{l+s+s2}{\PYGZdq{}} \PYG{k}{if} \PYG{n}{torch}\PYG{o}{.}\PYG{n}{cuda}\PYG{o}{.}\PYG{n}{is\PYGZus{}available}\PYG{p}{(}\PYG{p}{)} \PYG{k}{else} \PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{cpu}\PYG{l+s+s2}{\PYGZdq{}}
-
-\PYG{k}{for} \PYG{n}{epoch} \PYG{o+ow}{in} \PYG{n+nb}{range}\PYG{p}{(}\PYG{n}{num\PYGZus{}epochs}\PYG{p}{)}\PYG{p}{:}
-    \PYG{k}{for} \PYG{n}{batch} \PYG{o+ow}{in} \PYG{n}{data\PYGZus{}loader}\PYG{p}{:}
-        \PYG{c+c1}{\PYGZsh{} Move data to the appropriate device (e.g., GPU)}
-        \PYG{n}{observations} \PYG{o}{=} \PYG{n}{batch}\PYG{p}{[}\PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{observation.state}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{]}\PYG{o}{.}\PYG{n}{to}\PYG{p}{(}\PYG{n}{device}\PYG{p}{)}
-        \PYG{n}{actions} \PYG{o}{=} \PYG{n}{batch}\PYG{p}{[}\PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{action}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{]}\PYG{o}{.}\PYG{n}{to}\PYG{p}{(}\PYG{n}{device}\PYG{p}{)}
-        \PYG{n}{images} \PYG{o}{=} \PYG{n}{batch}\PYG{p}{[}\PYG{l+s+s2}{\PYGZdq{}}\PYG{l+s+s2}{observation.images.wrist\PYGZus{}camera}\PYG{l+s+s2}{\PYGZdq{}}\PYG{p}{]}\PYG{o}{.}\PYG{n}{to}\PYG{p}{(}\PYG{n}{device}\PYG{p}{)}
-
-        \PYG{c+c1}{\PYGZsh{} Next, you can do amazing\PYGZus{}model.forward(batch)}
-        \PYG{o}{.}\PYG{o}{.}\PYG{o}{.}
-\end{MintedVerbatim}
diff --git a/app/scripts/latex-to-mdx/input/_minted/_FAD58DE7366495DB4650CFEFAC2FCD61.index.minted b/app/scripts/latex-to-mdx/input/_minted/_FAD58DE7366495DB4650CFEFAC2FCD61.index.minted
deleted file mode 100644
index e253d0e92db1eaec96e192d396d3140316074ce2..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/_minted/_FAD58DE7366495DB4650CFEFAC2FCD61.index.minted
+++ /dev/null
@@ -1,10 +0,0 @@
-{
-  "jobname": "main",
-  "md5": "FAD58DE7366495DB4650CFEFAC2FCD61",
-  "timestamp": "20250911180655",
-  "cachefiles": [
-    "62B8750C0ACEBDA39A95140434E540A8.highlight.minted",
-    "_FAD58DE7366495DB4650CFEFAC2FCD61.index.minted",
-    "colorful.style.minted"
-  ]
-}
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/_minted/colorful.style.minted b/app/scripts/latex-to-mdx/input/_minted/colorful.style.minted
deleted file mode 100644
index 4afa6efb439608d812561686f7ec40f8010c0a39..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/_minted/colorful.style.minted
+++ /dev/null
@@ -1,100 +0,0 @@
-\makeatletter
-\def\PYG@reset{\let\PYG@it=\relax \let\PYG@bf=\relax%
-    \let\PYG@ul=\relax \let\PYG@tc=\relax%
-    \let\PYG@bc=\relax \let\PYG@ff=\relax}
-\def\PYG@tok#1{\csname PYG@tok@#1\endcsname}
-\def\PYG@toks#1+{\ifx\relax#1\empty\else%
-    \PYG@tok{#1}\expandafter\PYG@toks\fi}
-\def\PYG@do#1{\PYG@bc{\PYG@tc{\PYG@ul{%
-    \PYG@it{\PYG@bf{\PYG@ff{#1}}}}}}}
-\def\PYG#1#2{\PYG@reset\PYG@toks#1+\relax+\PYG@do{#2}}
-
-\@namedef{PYG@tok@w}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}}
-\@namedef{PYG@tok@c}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
-\@namedef{PYG@tok@cp}{\def\PYG@tc##1{\textcolor[rgb]{0.33,0.47,0.60}{##1}}}
-\@namedef{PYG@tok@cs}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.80,0.00,0.00}{##1}}}
-\@namedef{PYG@tok@k}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}}
-\@namedef{PYG@tok@kp}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.20,0.53}{##1}}}
-\@namedef{PYG@tok@kt}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.20,0.20,0.60}{##1}}}
-\@namedef{PYG@tok@o}{\def\PYG@tc##1{\textcolor[rgb]{0.20,0.20,0.20}{##1}}}
-\@namedef{PYG@tok@ow}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.00}{##1}}}
-\@namedef{PYG@tok@nb}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.44,0.13}{##1}}}
-\@namedef{PYG@tok@nf}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.40,0.73}{##1}}}
-\@namedef{PYG@tok@nc}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.73,0.00,0.40}{##1}}}
-\@namedef{PYG@tok@nn}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.05,0.52,0.71}{##1}}}
-\@namedef{PYG@tok@ne}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}}
-\@namedef{PYG@tok@nv}{\def\PYG@tc##1{\textcolor[rgb]{0.60,0.40,0.20}{##1}}}
-\@namedef{PYG@tok@vi}{\def\PYG@tc##1{\textcolor[rgb]{0.20,0.20,0.73}{##1}}}
-\@namedef{PYG@tok@vc}{\def\PYG@tc##1{\textcolor[rgb]{0.20,0.40,0.60}{##1}}}
-\@namedef{PYG@tok@vg}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.87,0.47,0.00}{##1}}}
-\@namedef{PYG@tok@no}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.20,0.40}{##1}}}
-\@namedef{PYG@tok@nl}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.60,0.47,0.00}{##1}}}
-\@namedef{PYG@tok@ni}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}}
-\@namedef{PYG@tok@na}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.80}{##1}}}
-\@namedef{PYG@tok@nt}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.47,0.00}{##1}}}
-\@namedef{PYG@tok@nd}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.33,0.33,0.33}{##1}}}
-\@namedef{PYG@tok@s}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@sc}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}
-\@namedef{PYG@tok@sd}{\def\PYG@tc##1{\textcolor[rgb]{0.87,0.27,0.13}{##1}}}
-\@namedef{PYG@tok@si}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{0.93,0.93,0.93}{\strut ##1}}}}
-\@namedef{PYG@tok@se}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@sr}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.00}{##1}}\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,1.00}{\strut ##1}}}}
-\@namedef{PYG@tok@ss}{\def\PYG@tc##1{\textcolor[rgb]{0.67,0.40,0.00}{##1}}}
-\@namedef{PYG@tok@sx}{\def\PYG@tc##1{\textcolor[rgb]{0.87,0.13,0.00}{##1}}\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@m}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.40,0.00,0.93}{##1}}}
-\@namedef{PYG@tok@mi}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.87}{##1}}}
-\@namedef{PYG@tok@mf}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.40,0.00,0.93}{##1}}}
-\@namedef{PYG@tok@mh}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.33,0.53}{##1}}}
-\@namedef{PYG@tok@mo}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.27,0.00,0.93}{##1}}}
-\@namedef{PYG@tok@gh}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
-\@namedef{PYG@tok@gu}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}}
-\@namedef{PYG@tok@gd}{\def\PYG@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}}
-\@namedef{PYG@tok@gi}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.63,0.00}{##1}}}
-\@namedef{PYG@tok@gr}{\def\PYG@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}}
-\@namedef{PYG@tok@ge}{\let\PYG@it=\textit}
-\@namedef{PYG@tok@gs}{\let\PYG@bf=\textbf}
-\@namedef{PYG@tok@ges}{\let\PYG@bf=\textbf\let\PYG@it=\textit}
-\@namedef{PYG@tok@gp}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.78,0.36,0.04}{##1}}}
-\@namedef{PYG@tok@go}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
-\@namedef{PYG@tok@gt}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}
-\@namedef{PYG@tok@err}{\def\PYG@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.67,0.67}{\strut ##1}}}}
-\@namedef{PYG@tok@kc}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}}
-\@namedef{PYG@tok@kd}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}}
-\@namedef{PYG@tok@kn}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}}
-\@namedef{PYG@tok@kr}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.53,0.00}{##1}}}
-\@namedef{PYG@tok@bp}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.44,0.13}{##1}}}
-\@namedef{PYG@tok@fm}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.40,0.73}{##1}}}
-\@namedef{PYG@tok@vm}{\def\PYG@tc##1{\textcolor[rgb]{0.60,0.40,0.20}{##1}}}
-\@namedef{PYG@tok@sa}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@sb}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@dl}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@s2}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@sh}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@s1}{\def\PYG@bc##1{{\setlength{\fboxsep}{0pt}\colorbox[rgb]{1.00,0.94,0.94}{\strut ##1}}}}
-\@namedef{PYG@tok@mb}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.40,0.00,0.93}{##1}}}
-\@namedef{PYG@tok@il}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.87}{##1}}}
-\@namedef{PYG@tok@ch}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
-\@namedef{PYG@tok@cm}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
-\@namedef{PYG@tok@cpf}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
-\@namedef{PYG@tok@c1}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
-
-\def\PYGZbs{\char`\\}
-\def\PYGZus{\char`\_}
-\def\PYGZob{\char`\{}
-\def\PYGZcb{\char`\}}
-\def\PYGZca{\char`\^}
-\def\PYGZam{\char`\&}
-\def\PYGZlt{\char`\<}
-\def\PYGZgt{\char`\>}
-\def\PYGZsh{\char`\#}
-\def\PYGZpc{\char`\%}
-\def\PYGZdl{\char`\$}
-\def\PYGZhy{\char`\-}
-\def\PYGZsq{\char`\'}
-\def\PYGZdq{\char`\"}
-\def\PYGZti{\char`\~}
-% for compatibility with earlier versions
-\def\PYGZat{@}
-\def\PYGZlb{[}
-\def\PYGZrb{]}
-\makeatother
diff --git a/app/scripts/latex-to-mdx/input/fancyhdr.sty b/app/scripts/latex-to-mdx/input/fancyhdr.sty
deleted file mode 100644
index 77ed4e3012d822c7cca5c17efcae308b32b8cc2b..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/fancyhdr.sty
+++ /dev/null
@@ -1,485 +0,0 @@
-% fancyhdr.sty version 3.2
-% Fancy headers and footers for LaTeX.
-% Piet van Oostrum, 
-% Dept of Computer and Information Sciences, University of Utrecht,
-% Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
-% Telephone: +31 30 2532180. Email: piet@cs.uu.nl
-% ========================================================================
-% LICENCE:
-% This file may be distributed under the terms of the LaTeX Project Public
-% License, as described in lppl.txt in the base LaTeX distribution.
-% Either version 1 or, at your option, any later version.
-% ========================================================================
-% MODIFICATION HISTORY:
-% Sep 16, 1994
-% version 1.4: Correction for use with \reversemargin
-% Sep 29, 1994:
-% version 1.5: Added the \iftopfloat, \ifbotfloat and \iffloatpage commands
-% Oct 4, 1994:
-% version 1.6: Reset single spacing in headers/footers for use with
-% setspace.sty or doublespace.sty
-% Oct 4, 1994:
-% version 1.7: changed \let\@mkboth\markboth to
-% \def\@mkboth{\protect\markboth} to make it more robust
-% Dec 5, 1994:
-% version 1.8: corrections for amsbook/amsart: define \@chapapp and (more
-% importantly) use the \chapter/sectionmark definitions from ps@headings if
-% they exist (which should be true for all standard classes).
-% May 31, 1995:
-% version 1.9: The proposed \renewcommand{\headrulewidth}{\iffloatpage...
-% construction in the doc did not work properly with the fancyplain style. 
-% June 1, 1995:
-% version 1.91: The definition of \@mkboth wasn't restored on subsequent
-% \pagestyle{fancy}'s.
-% June 1, 1995:
-% version 1.92: The sequence \pagestyle{fancyplain} \pagestyle{plain}
-% \pagestyle{fancy} would erroneously select the plain version.
-% June 1, 1995:
-% version 1.93: \fancypagestyle command added.
-% Dec 11, 1995:
-% version 1.94: suggested by Conrad Hughes <chughes@maths.tcd.ie>
-% CJCH, Dec 11, 1995: added \footruleskip to allow control over footrule
-% position (old hardcoded value of .3\normalbaselineskip is far too high
-% when used with very small footer fonts).
-% Jan 31, 1996:
-% version 1.95: call \@normalsize in the reset code if that is defined,
-% otherwise \normalsize.
-% this is to solve a problem with ucthesis.cls, as this doesn't
-% define \@currsize. Unfortunately for latex209 calling \normalsize doesn't
-% work as this is optimized to do very little, so there \@normalsize should
-% be called. Hopefully this code works for all versions of LaTeX known to
-% mankind.  
-% April 25, 1996:
-% version 1.96: initialize \headwidth to a magic (negative) value to catch
-% most common cases that people change it before calling \pagestyle{fancy}.
-% Note it can't be initialized when reading in this file, because
-% \textwidth could be changed afterwards. This is quite probable.
-% We also switch to \MakeUppercase rather than \uppercase and introduce a
-% \nouppercase command for use in headers. and footers.
-% May 3, 1996:
-% version 1.97: Two changes:
-% 1. Undo the change in version 1.8 (using the pagestyle{headings} defaults
-% for the chapter and section marks. The current version of amsbook and
-% amsart classes don't seem to need them anymore. Moreover the standard
-% latex classes don't use \markboth if twoside isn't selected, and this is
-% confusing as \leftmark doesn't work as expected.
-% 2. include a call to \ps@empty in ps@@fancy. This is to solve a problem
-% in the amsbook and amsart classes, that make global changes to \topskip,
-% which are reset in \ps@empty. Hopefully this doesn't break other things.
-% May 7, 1996:
-% version 1.98:
-% Added % after the line  \def\nouppercase
-% May 7, 1996:
-% version 1.99: This is the alpha version of fancyhdr 2.0
-% Introduced the new commands \fancyhead, \fancyfoot, and \fancyhf.
-% Changed \headrulewidth, \footrulewidth, \footruleskip to
-% macros rather than length parameters, In this way they can be
-% conditionalized and they don't consume length registers. There is no need
-% to have them as length registers unless you want to do calculations with
-% them, which is unlikely. Note that this may make some uses of them
-% incompatible (i.e. if you have a file that uses \setlength or \xxxx=)
-% May 10, 1996:
-% version 1.99a:
-% Added a few more % signs
-% May 10, 1996:
-% version 1.99b:
-% Changed the syntax of \f@nfor to be resistent to catcode changes of :=
-% Removed the [1] from the defs of \lhead etc. because the parameter is
-% consumed by the \@[xy]lhead etc. macros.
-% June 24, 1997:
-% version 1.99c:
-% corrected \nouppercase to also include the protected form of \MakeUppercase
-% \global added to manipulation of \headwidth.
-% \iffootnote command added.
-% Some comments added about \@fancyhead and \@fancyfoot.
-% Aug 24, 1998
-% version 1.99d
-% Changed the default \ps@empty to \ps@@empty in order to allow
-% \fancypagestyle{empty} redefinition.
-% Oct 11, 2000
-% version 2.0
-% Added LPPL license clause.
-%
-% A check for \headheight is added. An errormessage is given (once) if the
-% header is too large. Empty headers don't generate the error even if
-% \headheight is very small or even 0pt. 
-% Warning added for the use of 'E' option when twoside option is not used.
-% In this case the 'E' fields will never be used.
-%
-% Mar 10, 2002
-% version 2.1beta
-% New command: \fancyhfoffset[place]{length}
-% defines offsets to be applied to the header/footer to let it stick into
-% the margins (if length > 0).
-% place is like in fancyhead, except that only E,O,L,R can be used.
-% This replaces the old calculation based on \headwidth and the marginpar
-% area.
-% \headwidth will be dynamically calculated in the headers/footers when
-% this is used.
-%
-% Mar 26, 2002
-% version 2.1beta2
-% \fancyhfoffset now also takes h,f as possible letters in the argument to
-% allow the header and footer widths to be different.
-% New commands \fancyheadoffset and \fancyfootoffset added comparable to
-% \fancyhead and \fancyfoot.
-% Errormessages and warnings have been made more informative.
-%
-% Dec 9, 2002
-% version 2.1
-% The defaults for \footrulewidth, \plainheadrulewidth and
-% \plainfootrulewidth are changed from \z@skip to 0pt. In this way when
-% someone inadvertantly uses \setlength to change any of these, the value
-% of \z@skip will not be changed, rather an errormessage will be given.
-
-% March 3, 2004
-% Release of version 3.0
-
-% Oct 7, 2004
-% version 3.1
-% Added '\endlinechar=13' to \fancy@reset to prevent problems with
-% includegraphics in header when verbatiminput is active.
-
-% March 22, 2005
-% version 3.2
-% reset \everypar (the real one) in \fancy@reset because spanish.ldf does
-% strange things with \everypar between << and >>.
-
-\def\ifancy@mpty#1{\def\temp@a{#1}\ifx\temp@a\@empty}
-
-\def\fancy@def#1#2{\ifancy@mpty{#2}\fancy@gbl\def#1{\leavevmode}\else
-                                   \fancy@gbl\def#1{#2\strut}\fi}
-
-\let\fancy@gbl\global
-
-\def\@fancyerrmsg#1{%
-        \ifx\PackageError\undefined
-        \errmessage{#1}\else
-        \PackageError{Fancyhdr}{#1}{}\fi}
-\def\@fancywarning#1{%
-        \ifx\PackageWarning\undefined
-        \errmessage{#1}\else
-        \PackageWarning{Fancyhdr}{#1}{}\fi}
-
-% Usage: \@forc \var{charstring}{command to be executed for each char}
-% This is similar to LaTeX's \@tfor, but expands the charstring.
-
-\def\@forc#1#2#3{\expandafter\f@rc\expandafter#1\expandafter{#2}{#3}}
-\def\f@rc#1#2#3{\def\temp@ty{#2}\ifx\@empty\temp@ty\else
-                                    \f@@rc#1#2\f@@rc{#3}\fi}
-\def\f@@rc#1#2#3\f@@rc#4{\def#1{#2}#4\f@rc#1{#3}{#4}}
-
-% Usage: \f@nfor\name:=list\do{body}
-% Like LaTeX's \@for but an empty list is treated as a list with an empty
-% element
-
-\newcommand{\f@nfor}[3]{\edef\@fortmp{#2}%
-    \expandafter\@forloop#2,\@nil,\@nil\@@#1{#3}}
-
-% Usage: \def@ult \cs{defaults}{argument}
-% sets \cs to the characters from defaults appearing in argument
-% or defaults if it would be empty. All characters are lowercased.
-
-\newcommand\def@ult[3]{%
-    \edef\temp@a{\lowercase{\edef\noexpand\temp@a{#3}}}\temp@a
-    \def#1{}%
-    \@forc\tmpf@ra{#2}%
-        {\expandafter\if@in\tmpf@ra\temp@a{\edef#1{#1\tmpf@ra}}{}}%
-    \ifx\@empty#1\def#1{#2}\fi}
-% 
-% \if@in <char><set><truecase><falsecase>
-%
-\newcommand{\if@in}[4]{%
-    \edef\temp@a{#2}\def\temp@b##1#1##2\temp@b{\def\temp@b{##1}}%
-    \expandafter\temp@b#2#1\temp@b\ifx\temp@a\temp@b #4\else #3\fi}
-
-\newcommand{\fancyhead}{\@ifnextchar[{\f@ncyhf\fancyhead h}%
-                                     {\f@ncyhf\fancyhead h[]}}
-\newcommand{\fancyfoot}{\@ifnextchar[{\f@ncyhf\fancyfoot f}%
-                                     {\f@ncyhf\fancyfoot f[]}}
-\newcommand{\fancyhf}{\@ifnextchar[{\f@ncyhf\fancyhf{}}%
-                                   {\f@ncyhf\fancyhf{}[]}}
-
-% New commands for offsets added
-
-\newcommand{\fancyheadoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyheadoffset h}%
-                                           {\f@ncyhfoffs\fancyheadoffset h[]}}
-\newcommand{\fancyfootoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyfootoffset f}%
-                                           {\f@ncyhfoffs\fancyfootoffset f[]}}
-\newcommand{\fancyhfoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyhfoffset{}}%
-                                         {\f@ncyhfoffs\fancyhfoffset{}[]}}
-
-% The header and footer fields are stored in command sequences with
-% names of the form: \f@ncy<x><y><z> with <x> for [eo], <y> from [lcr]
-% and <z> from [hf].
-
-\def\f@ncyhf#1#2[#3]#4{%
-    \def\temp@c{}%
-    \@forc\tmpf@ra{#3}%
-        {\expandafter\if@in\tmpf@ra{eolcrhf,EOLCRHF}%
-            {}{\edef\temp@c{\temp@c\tmpf@ra}}}%
-    \ifx\@empty\temp@c\else
-        \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
-          [#3]}%
-    \fi
-    \f@nfor\temp@c{#3}%
-        {\def@ult\f@@@eo{eo}\temp@c
-         \if@twoside\else
-           \if\f@@@eo e\@fancywarning
-             {\string#1's `E' option without twoside option is useless}\fi\fi
-         \def@ult\f@@@lcr{lcr}\temp@c
-         \def@ult\f@@@hf{hf}{#2\temp@c}%
-         \@forc\f@@eo\f@@@eo
-             {\@forc\f@@lcr\f@@@lcr
-                 {\@forc\f@@hf\f@@@hf
-                     {\expandafter\fancy@def\csname
-                      f@ncy\f@@eo\f@@lcr\f@@hf\endcsname
-                      {#4}}}}}}
-
-\def\f@ncyhfoffs#1#2[#3]#4{%
-    \def\temp@c{}%
-    \@forc\tmpf@ra{#3}%
-        {\expandafter\if@in\tmpf@ra{eolrhf,EOLRHF}%
-            {}{\edef\temp@c{\temp@c\tmpf@ra}}}%
-    \ifx\@empty\temp@c\else
-        \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
-          [#3]}%
-    \fi
-    \f@nfor\temp@c{#3}%
-        {\def@ult\f@@@eo{eo}\temp@c
-         \if@twoside\else
-           \if\f@@@eo e\@fancywarning
-             {\string#1's `E' option without twoside option is useless}\fi\fi
-         \def@ult\f@@@lcr{lr}\temp@c
-         \def@ult\f@@@hf{hf}{#2\temp@c}%
-         \@forc\f@@eo\f@@@eo
-             {\@forc\f@@lcr\f@@@lcr
-                 {\@forc\f@@hf\f@@@hf
-                     {\expandafter\setlength\csname
-                      f@ncyO@\f@@eo\f@@lcr\f@@hf\endcsname
-                      {#4}}}}}%
-     \fancy@setoffs}
-
-% Fancyheadings version 1 commands. These are more or less deprecated,
-% but they continue to work.
-
-\newcommand{\lhead}{\@ifnextchar[{\@xlhead}{\@ylhead}}
-\def\@xlhead[#1]#2{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#2}}
-\def\@ylhead#1{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#1}}
-
-\newcommand{\chead}{\@ifnextchar[{\@xchead}{\@ychead}}
-\def\@xchead[#1]#2{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#2}}
-\def\@ychead#1{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#1}}
-
-\newcommand{\rhead}{\@ifnextchar[{\@xrhead}{\@yrhead}}
-\def\@xrhead[#1]#2{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#2}}
-\def\@yrhead#1{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#1}}
-
-\newcommand{\lfoot}{\@ifnextchar[{\@xlfoot}{\@ylfoot}}
-\def\@xlfoot[#1]#2{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#2}}
-\def\@ylfoot#1{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#1}}
-
-\newcommand{\cfoot}{\@ifnextchar[{\@xcfoot}{\@ycfoot}}
-\def\@xcfoot[#1]#2{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#2}}
-\def\@ycfoot#1{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#1}}
-
-\newcommand{\rfoot}{\@ifnextchar[{\@xrfoot}{\@yrfoot}}
-\def\@xrfoot[#1]#2{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#2}}
-\def\@yrfoot#1{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#1}}
-
-\newlength{\fancy@headwidth}
-\let\headwidth\fancy@headwidth
-\newlength{\f@ncyO@elh}
-\newlength{\f@ncyO@erh}
-\newlength{\f@ncyO@olh}
-\newlength{\f@ncyO@orh}
-\newlength{\f@ncyO@elf}
-\newlength{\f@ncyO@erf}
-\newlength{\f@ncyO@olf}
-\newlength{\f@ncyO@orf}
-\newcommand{\headrulewidth}{0.4pt}
-\newcommand{\footrulewidth}{0pt}
-\newcommand{\footruleskip}{.3\normalbaselineskip}
-
-% Fancyplain stuff shouldn't be used anymore (rather
-% \fancypagestyle{plain} should be used), but it must be present for
-% compatibility reasons.
-
-\newcommand{\plainheadrulewidth}{0pt}
-\newcommand{\plainfootrulewidth}{0pt}
-\newif\if@fancyplain \@fancyplainfalse
-\def\fancyplain#1#2{\if@fancyplain#1\else#2\fi}
-
-\headwidth=-123456789sp %magic constant
-
-% Command to reset various things in the headers:
-% a.o.  single spacing (taken from setspace.sty)
-% and the catcode of ^^M (so that epsf files in the header work if a
-% verbatim crosses a page boundary)
-% It also defines a \nouppercase command that disables \uppercase and
-% \Makeuppercase. It can only be used in the headers and footers.
-\let\fnch@everypar\everypar% save real \everypar because of spanish.ldf
-\def\fancy@reset{\fnch@everypar{}\restorecr\endlinechar=13
- \def\baselinestretch{1}%
- \def\nouppercase##1{{\let\uppercase\relax\let\MakeUppercase\relax
-     \expandafter\let\csname MakeUppercase \endcsname\relax##1}}%
- \ifx\undefined\@newbaseline% NFSS not present; 2.09 or 2e
-   \ifx\@normalsize\undefined \normalsize % for ucthesis.cls
-   \else \@normalsize \fi
- \else% NFSS (2.09) present
-  \@newbaseline%
- \fi}
-
-% Initialization of the head and foot text.
-
-% The default values still contain \fancyplain for compatibility.
-\fancyhf{} % clear all
-% lefthead empty on ``plain'' pages, \rightmark on even, \leftmark on odd pages
-% evenhead empty on ``plain'' pages, \leftmark on even, \rightmark on odd pages
-\if@twoside
-  \fancyhead[el,or]{\fancyplain{}{\sl\rightmark}}
-  \fancyhead[er,ol]{\fancyplain{}{\sl\leftmark}}
-\else
-  \fancyhead[l]{\fancyplain{}{\sl\rightmark}}
-  \fancyhead[r]{\fancyplain{}{\sl\leftmark}}
-\fi
-\fancyfoot[c]{\rm\thepage} % page number
-
-% Use box 0 as a temp box and dimen 0 as temp dimen. 
-% This can be done, because this code will always
-% be used inside another box, and therefore the changes are local.
-
-\def\@fancyvbox#1#2{\setbox0\vbox{#2}\ifdim\ht0>#1\@fancywarning
-  {\string#1 is too small (\the#1): ^^J Make it at least \the\ht0.^^J
-    We now make it that large for the rest of the document.^^J
-    This may cause the page layout to be inconsistent, however\@gobble}%
-  \dimen0=#1\global\setlength{#1}{\ht0}\ht0=\dimen0\fi
-  \box0}
-
-% Put together a header or footer given the left, center and
-% right text, fillers at left and right and a rule.
-% The \lap commands put the text into an hbox of zero size,
-% so overlapping text does not generate an errormessage.
-% These macros have 5 parameters:
-% 1. LEFTSIDE BEARING % This determines at which side the header will stick
-%    out. When \fancyhfoffset is used this calculates \headwidth, otherwise
-%    it is \hss or \relax (after expansion).
-% 2. \f@ncyolh, \f@ncyelh, \f@ncyolf or \f@ncyelf. This is the left component.
-% 3. \f@ncyoch, \f@ncyech, \f@ncyocf or \f@ncyecf. This is the middle comp.
-% 4. \f@ncyorh, \f@ncyerh, \f@ncyorf or \f@ncyerf. This is the right component.
-% 5. RIGHTSIDE BEARING. This is always \relax or \hss (after expansion).
-
-\def\@fancyhead#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
-  \@fancyvbox\headheight{\hbox
-    {\rlap{\parbox[b]{\headwidth}{\raggedright#2}}\hfill
-      \parbox[b]{\headwidth}{\centering#3}\hfill
-      \llap{\parbox[b]{\headwidth}{\raggedleft#4}}}\headrule}}#5}
-
-\def\@fancyfoot#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
-    \@fancyvbox\footskip{\footrule
-      \hbox{\rlap{\parbox[t]{\headwidth}{\raggedright#2}}\hfill
-        \parbox[t]{\headwidth}{\centering#3}\hfill
-        \llap{\parbox[t]{\headwidth}{\raggedleft#4}}}}}#5}
-
-\def\headrule{{\if@fancyplain\let\headrulewidth\plainheadrulewidth\fi
-    \hrule\@height\headrulewidth\@width\headwidth \vskip-\headrulewidth}}
-
-\def\footrule{{\if@fancyplain\let\footrulewidth\plainfootrulewidth\fi
-    \vskip-\footruleskip\vskip-\footrulewidth
-    \hrule\@width\headwidth\@height\footrulewidth\vskip\footruleskip}}
-
-\def\ps@fancy{%
-\@ifundefined{@chapapp}{\let\@chapapp\chaptername}{}%for amsbook
-%
-% Define \MakeUppercase for old LaTeXen.
-% Note: we used \def rather than \let, so that \let\uppercase\relax (from
-% the version 1 documentation) will still work.
-%
-\@ifundefined{MakeUppercase}{\def\MakeUppercase{\uppercase}}{}%
-\@ifundefined{chapter}{\def\sectionmark##1{\markboth
-{\MakeUppercase{\ifnum \c@secnumdepth>\z@
- \thesection\hskip 1em\relax \fi ##1}}{}}%
-\def\subsectionmark##1{\markright {\ifnum \c@secnumdepth >\@ne
- \thesubsection\hskip 1em\relax \fi ##1}}}%
-{\def\chaptermark##1{\markboth {\MakeUppercase{\ifnum \c@secnumdepth>\m@ne
- \@chapapp\ \thechapter. \ \fi ##1}}{}}%
-\def\sectionmark##1{\markright{\MakeUppercase{\ifnum \c@secnumdepth >\z@
- \thesection. \ \fi ##1}}}}%
-%\csname ps@headings\endcsname % use \ps@headings defaults if they exist
-\ps@@fancy
-\gdef\ps@fancy{\@fancyplainfalse\ps@@fancy}%
-% Initialize \headwidth if the user didn't
-%
-\ifdim\headwidth<0sp
-%
-% This catches the case that \headwidth hasn't been initialized and the
-% case that the user added something to \headwidth in the expectation that
-% it was initialized to \textwidth. We compensate this now. This loses if
-% the user intended to multiply it by a factor. But that case is more
-% likely done by saying something like \headwidth=1.2\textwidth. 
-% The doc says you have to change \headwidth after the first call to
-% \pagestyle{fancy}. This code is just to catch the most common cases were
-% that requirement is violated.
-%
-    \global\advance\headwidth123456789sp\global\advance\headwidth\textwidth
-\fi}
-\def\ps@fancyplain{\ps@fancy \let\ps@plain\ps@plain@fancy}
-\def\ps@plain@fancy{\@fancyplaintrue\ps@@fancy}
-\let\ps@@empty\ps@empty
-\def\ps@@fancy{%
-\ps@@empty % This is for amsbook/amsart, which do strange things with \topskip
-\def\@mkboth{\protect\markboth}%
-\def\@oddhead{\@fancyhead\fancy@Oolh\f@ncyolh\f@ncyoch\f@ncyorh\fancy@Oorh}%
-\def\@oddfoot{\@fancyfoot\fancy@Oolf\f@ncyolf\f@ncyocf\f@ncyorf\fancy@Oorf}%
-\def\@evenhead{\@fancyhead\fancy@Oelh\f@ncyelh\f@ncyech\f@ncyerh\fancy@Oerh}%
-\def\@evenfoot{\@fancyfoot\fancy@Oelf\f@ncyelf\f@ncyecf\f@ncyerf\fancy@Oerf}%
-}
-% Default definitions for compatibility mode:
-% These cause the header/footer to take the defined \headwidth as width
-% And to shift in the direction of the marginpar area
-
-\def\fancy@Oolh{\if@reversemargin\hss\else\relax\fi}
-\def\fancy@Oorh{\if@reversemargin\relax\else\hss\fi}
-\let\fancy@Oelh\fancy@Oorh
-\let\fancy@Oerh\fancy@Oolh
-
-\let\fancy@Oolf\fancy@Oolh
-\let\fancy@Oorf\fancy@Oorh
-\let\fancy@Oelf\fancy@Oelh
-\let\fancy@Oerf\fancy@Oerh
-
-% New definitions for the use of \fancyhfoffset
-% These calculate the \headwidth from \textwidth and the specified offsets.
-
-\def\fancy@offsolh{\headwidth=\textwidth\advance\headwidth\f@ncyO@olh
-                   \advance\headwidth\f@ncyO@orh\hskip-\f@ncyO@olh}
-\def\fancy@offselh{\headwidth=\textwidth\advance\headwidth\f@ncyO@elh
-                   \advance\headwidth\f@ncyO@erh\hskip-\f@ncyO@elh}
-
-\def\fancy@offsolf{\headwidth=\textwidth\advance\headwidth\f@ncyO@olf
-                   \advance\headwidth\f@ncyO@orf\hskip-\f@ncyO@olf}
-\def\fancy@offself{\headwidth=\textwidth\advance\headwidth\f@ncyO@elf
-                   \advance\headwidth\f@ncyO@erf\hskip-\f@ncyO@elf}
-
-\def\fancy@setoffs{%
-% Just in case \let\headwidth\textwidth was used
-  \fancy@gbl\let\headwidth\fancy@headwidth
-  \fancy@gbl\let\fancy@Oolh\fancy@offsolh
-  \fancy@gbl\let\fancy@Oelh\fancy@offselh
-  \fancy@gbl\let\fancy@Oorh\hss
-  \fancy@gbl\let\fancy@Oerh\hss
-  \fancy@gbl\let\fancy@Oolf\fancy@offsolf
-  \fancy@gbl\let\fancy@Oelf\fancy@offself
-  \fancy@gbl\let\fancy@Oorf\hss
-  \fancy@gbl\let\fancy@Oerf\hss}
-
-\newif\iffootnote
-\let\latex@makecol\@makecol
-\def\@makecol{\ifvoid\footins\footnotetrue\else\footnotefalse\fi
-\let\topfloat\@toplist\let\botfloat\@botlist\latex@makecol}
-\def\iftopfloat#1#2{\ifx\topfloat\empty #2\else #1\fi}
-\def\ifbotfloat#1#2{\ifx\botfloat\empty #2\else #1\fi}
-\def\iffloatpage#1#2{\if@fcolmade #1\else #2\fi}
-
-\newcommand{\fancypagestyle}[2]{%
-  \@namedef{ps@#1}{\let\fancy@gbl\relax#2\relax\ps@fancy}}
diff --git a/app/scripts/latex-to-mdx/input/figures/ch1/ch1-lerobot-figure1.png b/app/scripts/latex-to-mdx/input/figures/ch1/ch1-lerobot-figure1.png
deleted file mode 100644
index 9a43981b7d60df842224ee6bff9be820809b36b6..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch1/ch1-lerobot-figure1.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:a850d2b9170736a42366d65dd858408dcffafa3420a0c6cfd678bbdd29a196fa
-size 2861318
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-approaches.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-approaches.png
deleted file mode 100644
index 161aac09e5cae1c51d7a24deb2038ad80358e8cb..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-approaches.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:d07f3166fd9efe5b0823ecca63166c019b6fb9dcc912f7b1ae0fd209a25ba274
-size 93262
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-classical-limitations.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-classical-limitations.png
deleted file mode 100644
index 969684eb34a3f473e0a0df8ec491c27144d69613..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-classical-limitations.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:85742a774d8d1ad3e36fc50d89c5a69409bce98ebe6bdba734896156ba668aa8
-size 4739243
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-cost-accessibility.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-cost-accessibility.png
deleted file mode 100644
index 17aa82045475dc0e0537649285e4abd0a9aefd2b..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-cost-accessibility.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:606cbb89fda90a2ddb22dc721ea978ffa9fe34a7f9f0bf1614b6ae53b4117411
-size 1962263
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-box.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-box.png
deleted file mode 100644
index 608b518385558b273d591d7f76d1d2804ece01b8..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-box.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:3c856918ffb061c235d05e74df6310412f5b41ea907f0f12f55fed5c8b45590b
-size 93114
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-shelf.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-shelf.png
deleted file mode 100644
index 47c539881d7b58df4b4493093ab6b780c349a476..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor-shelf.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e4abb239c45a576a02fc2cbd0d87f877b2c5f61dcac74e1b8c79a70ebacaca3e
-size 83589
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor.png
deleted file mode 100644
index 1f19ca65db5de85acc43ca8240987b99fd298231..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-floor.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:4a2c70f2d7c903d9f16433a9ca44c10892fd0e10ca90e2d9b8438c3d25fa623a
-size 58946
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-free.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-free.png
deleted file mode 100644
index 42d6dc9662903b2563663a9b409a8dc83f69906f..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-planar-manipulator-free.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:9d860153a76720749a50a6d06c7bcb9886f5605a867f130f66810597ca3f5299
-size 44656
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-platforms.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-platforms.png
deleted file mode 100644
index 4ccc153ed092d5493052d1ddede64094ae6b4068..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-platforms.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:baf76deb1a68b859d1e702bc7d0b4173a6b34b56d4bdf75c4748e80eb1934aad
-size 3616534
diff --git a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-so100-to-planar-manipulator.png b/app/scripts/latex-to-mdx/input/figures/ch2/ch2-so100-to-planar-manipulator.png
deleted file mode 100644
index d4bc70f800df876a10b6fdb4ac51c2544b2977fb..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch2/ch2-so100-to-planar-manipulator.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:731806e912421ee3f3fcd10c24b5f5e9f4dd448f859e8213f8f11c0821fcbf59
-size 1555756
diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-agent-env.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-agent-env.png
deleted file mode 100644
index 9d3ac5a9b05c8c48faf8660a5cac80737392110f..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-agent-env.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:43c8641128f72b994a7269561fd6beaf2fbe0d73bb19f58ade559e271de1de31
-size 42614
diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-duck-sim-vs-real.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-duck-sim-vs-real.png
deleted file mode 100644
index 142a5ea15f01aee271c1775e26a6a2c7bc4aedcc..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-duck-sim-vs-real.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:c682cfebec3bf21f579a687d4f6a34d6f7cff225397e081188c39ca3b3def1e7
-size 1762155
diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-hil-serl-examples.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-hil-serl-examples.png
deleted file mode 100644
index d665f43d5ed8972fc76399ed8caedd9fee4b373e..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-hil-serl-examples.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:ae41b09a8a8412b28994425565438a897f827b3a2048d6832c2be7884b40a2af
-size 7216604
diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-atlas.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-atlas.png
deleted file mode 100644
index 6aceb0b7ccaefebf0bb854ab012eca0cc3ac5da2..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-atlas.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:124d586210aa9b3a110c712c4eff3629d0064a507c9c77bf937dd00cc959428c
-size 178001
diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-benefits.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-benefits.png
deleted file mode 100644
index 89684d039e24b897517612c222ef6e979f42a7c2..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-learning-benefits.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:c23f98c050afb75098f34a2bca49fa30ebb4a2b373447c36ba62612854253ff3
-size 6936585
diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-many-ducks.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-many-ducks.png
deleted file mode 100644
index 7605bcb2ba0f2abcd7213a4ca092e792db08c504..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-many-ducks.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:418bdeff168978207fcc623db74d25b86d11f27d1100a28238bc1591901b93de
-size 4872198
diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-algorithms-atlas.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-algorithms-atlas.png
deleted file mode 100644
index 95e818db1704eb52f601c8d5a32f215b7cf7620c..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-algorithms-atlas.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:2aa853e6067e7bd06cfa0d12250d4277fbe2020b8a2b817c005b084c49c905d5
-size 194522
diff --git a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-examples.png b/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-examples.png
deleted file mode 100644
index 06de5007b9f0c10c23f79a2af13865a701916662..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch3/ch3-rl-examples.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:edb1fa24ee3d279302980016809eab038fc43037156b8d7cadae7fa5b9dddbba
-size 9051359
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-decoder.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-decoder.png
deleted file mode 100644
index 9a09fcb99bb717287ca74d165a3ca5d6983febba..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-decoder.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:578074c47e65992422e9cb991949b1d63598aded2098dfde3925a33dfd55e481
-size 3180391
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-encoder.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-encoder.png
deleted file mode 100644
index f587680a13512bae2fe83b3b472ea54a273293e5..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act-encoder.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:7ceeeccb9dd7e791f215f71ee422d9adfb8c2ff1d2417a851e31ba6a6715aaf7
-size 874336
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act.png
deleted file mode 100644
index 1f884e4a57994ca4a50e979ce8a7595bd02afc6f..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-act.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:318b6f77277c5e8fcf51e2aba63154ee99052e2bcff2af0387fb3cfd1d07cff7
-size 1517348
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-action-vs-observation-distribution.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-action-vs-observation-distribution.png
deleted file mode 100644
index fc82dc6c86ce40126b00697f13a43cc563fe4b4d..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-action-vs-observation-distribution.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:7db4ecc0d54d9cab6b8a16017c81bfd9b7fd5d7997bcdd645ccf57167f7efcf2
-size 274240
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-async-inference.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-async-inference.png
deleted file mode 100644
index 73aae17126c70f3fca8651ef62b7d519c81e6f58..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-async-inference.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:850ebb6e6ad809edc48597a89cf8e25b2664b9137ca4602ae14f164524f8d232
-size 282300
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-bc-trajectories.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-bc-trajectories.png
deleted file mode 100644
index d577a6966244c54eb3738bd61af13232a603145a..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-bc-trajectories.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:0ede85dbb8f12b3cced4dc0e12f97e3713d8432953183840f99e8534998d7f3b
-size 2253030
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-policy.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-policy.png
deleted file mode 100644
index 56da7917d95a1592faafde62702170fac438f903..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-policy.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:c3cb644c79fd016e77c78bd7fcf185908b18fb127f656003eb577349cfb6da40
-size 2805702
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-robot-actions.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-robot-actions.png
deleted file mode 100644
index 43d8ce2193bdaeecb172de160290392aaf4000c0..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-robot-actions.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:a59b816b60a53784127e3dcf0aad612ba14474bde57e1c2b73b670665d1b70ec
-size 8927638
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-vs-flowmatching.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-vs-flowmatching.png
deleted file mode 100644
index 2f4898e0c4db3a001354cc9a78d40e7537b34359..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-diffusion-vs-flowmatching.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:aef138f5120025b0bad73788bc8b3af91f27331af3b49bafb09b15037944fa12
-size 189022
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-issues-with-bc.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-issues-with-bc.png
deleted file mode 100644
index 789283d5085bae36ebaf062bd157007988e2dd23..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-issues-with-bc.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:7b726d8aa64534e8cbec4a0084fd86e4dfcc0b17685559970006a573dd326459
-size 1560808
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-latent-variable-model.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-latent-variable-model.png
deleted file mode 100644
index 62a7ade0557696ee25c61d10ef323ca1ec9bb077..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-latent-variable-model.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e5b1f48d4dc011d5a20b1d5bccc5cde750f4ffab4b8c48bb5b04529a18aa0390
-size 983775
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-many-latents.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-many-latents.png
deleted file mode 100644
index d972eb9694fe47d81d7a5bff66f78edd80c83e57..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-many-latents.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1f5421aae5c9e9735de598fca1a5c68ef7fd28c8b31112c4675356f6deda9b29
-size 222323
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-normalizing-flows.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-normalizing-flows.png
deleted file mode 100644
index cf51b8de51af38c0ea807889d8056d41c524c2d5..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-normalizing-flows.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:51f73d09b35b8ccd5685c6b26f7615f8d6ab3df7d045b2502e9232bfe33beace
-size 278482
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-observation-action-mapping.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-observation-action-mapping.png
deleted file mode 100644
index 6206870edf17a28bafe36ca0c5631a62b14f5a6a..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-observation-action-mapping.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:f1a4a70971ea4c7cf73c089a70e4bc9dd1b5aba43021016fea8b323ad2642c53
-size 2081981
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-queues.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-queues.png
deleted file mode 100644
index c1e912ba8a2d5b254ea9d990ba8dbab491cb22ed..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-queues.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:8d3072c26d0419ee4b19f4ebd10c66e117e113514326eb3e7864057644c305d7
-size 1971787
diff --git a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-task-effect-on-pairs.png b/app/scripts/latex-to-mdx/input/figures/ch4/ch4-task-effect-on-pairs.png
deleted file mode 100644
index 6fa47c83e5ba456655b025bd651aea0fc6feeeaa..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch4/ch4-task-effect-on-pairs.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:0423b4760f661afa6b81a896a473a4bfc50737b0ecef76fa75051eb6ccf69896
-size 1186204
diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-generalist-policies-timeline.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-generalist-policies-timeline.png
deleted file mode 100644
index d85a308d7665bd9c6fab4b0f59f622b0e1599745..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-generalist-policies-timeline.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:98f0efdb30302f2fd582bbec379007ef3d2188171f0d700014539560b5d29a9f
-size 121521
diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-ml-vs-robotics-foundation.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-ml-vs-robotics-foundation.png
deleted file mode 100644
index 0327c71faf9a48c757b6a6f3027f7e54cac6f0e7..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-ml-vs-robotics-foundation.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e858e0c5c2d7246e097c8e048d7c378c0ce20c922e66ceac8db8dbb2c5598e79
-size 3389240
diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0-sampling-timesteps.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0-sampling-timesteps.png
deleted file mode 100644
index 84401c9e5468cef66fcd2cdf2014f0c103003c93..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0-sampling-timesteps.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:2c27d0d34e08154b42692d1a3ea142ef7742ab50547211e9b22f16d79d14fbb3
-size 186917
diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0.png
deleted file mode 100644
index 4ea364ceb9691e4ea9928caac2ee6a32860a52d3..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-pi0.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:689a7d0a94d116edce122d8c9010aa456ae7d1d816f5684513711d36c94ebb89
-size 1242717
diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-smolvla.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-smolvla.png
deleted file mode 100644
index 488341b99047ecfad012127baa3a759354577853..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-smolvla.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:49575d51c64eb320c588673fb9b33d1d0a3de7f6af7165a18c35ffb40af93e7a
-size 1333430
diff --git a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-trends.png b/app/scripts/latex-to-mdx/input/figures/ch5/ch5-trends.png
deleted file mode 100644
index b399968a1d56a98ce0f4af3d1458cf903a1e1471..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/ch5/ch5-trends.png
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:357708ec69852658d69c5f3ec3d9c5805939fdaa0d13150f6777731579db09fe
-size 636731
diff --git a/app/scripts/latex-to-mdx/input/figures/misc/lerobot-team.jpeg b/app/scripts/latex-to-mdx/input/figures/misc/lerobot-team.jpeg
deleted file mode 100644
index 330c9a79b9751bf86ffe5ce84a9aaac88ac5d7e6..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/figures/misc/lerobot-team.jpeg
+++ /dev/null
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:7b79149533fb8602ee423c91c068100657745045bfd1507a6a61e30d58c65877
-size 170202
diff --git a/app/scripts/latex-to-mdx/input/handles.tex b/app/scripts/latex-to-mdx/input/handles.tex
deleted file mode 100644
index 47b267c18598edaa9c272e08fa5dba7b3df72138..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/handles.tex
+++ /dev/null
@@ -1,52 +0,0 @@
-\definecolor{hf1}{HTML}{FFD220}
-\definecolor{hf2}{HTML}{FF8360}
-\definecolor{hf3}{HTML}{2D728F}
-\definecolor{hf4}{HTML}{B5DFCA}
-\definecolor{hf5}{HTML}{BABFD1}
-
-\newcommand{\highlight}[1]{\textcolor{hf2}{#1}}
-
-\newcommand{\lerobot}{\texttt{lerobot}}
-\newcommand{\FK}{\text{FK}}
-\newcommand{\targetvel}{\dot {p}^*}
-\newcommand{\targetpos}{p^*}
-
-\newcommand{\statespace}{\mathcal S}
-\newcommand{\actionspace}{\mathcal A}
-\newcommand{\obsspace}{\mathcal O}
-\newcommand{\dynamics}{\mathcal D}
-\newcommand{\stateplusone}{s_{t+1}}
-\newcommand{\state}{s_t}
-\newcommand{\action}{a_t}
-\newcommand{\transition}{(\state, \action, \stateplusone)}
-\newcommand{\sars}{(\state, \action, r_t, \stateplusone)}
-\newcommand{\transitiongiven}{(\stateplusone \vert \state, \action)}
-\newcommand{\transitionprob}{\mathbb P \transitiongiven}
-\newcommand{\trajectory}{(s_0, a_0, r_0, s_1, a_1, r_1, \dots, s_{T-1}, a_{T-1}, r_{T-1}, s_T)}
-\newcommand{\Jpi}{J (\pi_\theta) }
-\newcommand{\qfunction}{\(Q\)-function}
-\newcommand{\qopt}{\( Q^* \)}
-
-\newcommand{\supp}[1]{\text{supp}({#1})}
-\newcommand{\DKL}{\text{D}_{\text{KL}}}
-
-\newcommand{\actionchunk}{\mathbf{A}}
-\newcommand{\actionexpert}{\mathbf{v}_\theta}
-\newcommand{\pizero}{\( \pi_0 \)}
-
-% TL;DR boxes at the beginning of each chapter
-\newtcolorbox{callout}[2][]{
-  enhanced, breakable,
-  colback=hfbackground, opacityback=0.85,
-  colframe=ai2accent, boxrule=0.6pt, arc=2mm,
-  left=8pt, right=8pt, top=8pt, bottom=8pt,
-  before skip=10pt, after skip=10pt,
-  fonttitle=\sffamily\bfseries,
-  title={\sffamily\bfseries #2},
-  #1
-}
-
-% Convenience environment for TL;DR
-\newenvironment{tldr}[1][TL;DR]{\begin{callout}{#1}}{\end{callout}}
-
-\newcommand{\lerobotdataset}{\texttt{LeRobotDataset}}
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/hfstyle/defns.tex b/app/scripts/latex-to-mdx/input/hfstyle/defns.tex
deleted file mode 100644
index 747abce77e91abdb7683af5b9e9974aef3a1462a..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/hfstyle/defns.tex
+++ /dev/null
@@ -1,502 +0,0 @@
-%
-% A useful set of commands
-\usepackage{mathtools}
-\usepackage{dsfont}
-\usepackage[dvipsnames]{xcolor}
-\usepackage[colorinlistoftodos]{todonotes}
-\usepackage{booktabs}
-\usepackage{xfrac}
-\usepackage{bbm}
-
-\usepackage{algpseudocode}
-\usepackage{algorithm}
-\usepackage{algorithmicx}
-
-\usepackage[most]{tcolorbox}
-\usepackage{xparse}
-\usepackage{lipsum}
-\usepackage{changepage}
-\usepackage{enumitem}
-
-\newcommand{\qedwhite}{\hfill \ensuremath{\Box}}
-%http://www-db.stanford.edu/~manku/latex.html
-%The itemize environment can be replaced by:
-\newcommand{\squishlist}{
-   \begin{list}{$\bullet$}
-    { \setlength{\itemsep}{0pt}      \setlength{\parsep}{3pt}
-      \setlength{\topsep}{3pt}       \setlength{\partopsep}{0pt}
-      \setlength{\leftmargin}{1.5em} \setlength{\labelwidth}{1em}
-      \setlength{\labelsep}{0.5em} } }
-
-\newcommand{\squishlisttwo}{
-   \begin{list}{$\bullet$}
-    { \setlength{\itemsep}{0pt}    \setlength{\parsep}{0pt}
-      \setlength{\topsep}{0pt}     \setlength{\partopsep}{0pt}
-      \setlength{\leftmargin}{2em} \setlength{\labelwidth}{1.5em}
-      \setlength{\labelsep}{0.5em} } }
-
-\newcommand{\squishend}{
-    \end{list}  }
-
-%Example usage: \squishlist    %% \begin{itemize}
-%\item First item
-%\item Second item
-%\squishend     %% \end{itemize}
-
-\newcommand{\denselist}{\itemsep 0pt\topsep-6pt\partopsep-6pt}
-
-
-\newtheorem{thm}{Theorem}[section]
-\newtheorem{cor}{Corollary}[section]
-\newtheorem{defn}{Definition}[section]
-\newenvironment{mythm}{{\bf Theorem}}{}
-
-
-
-\newcommand{\tm}{\tilde{m}}
-\newcommand{\tv}{\tilde{v}}
-
-% For bold vector symbols
-\newcommand{\myvec}[1]{\boldsymbol{#1}}
-\newcommand{\myvecsym}[1]{\boldsymbol{#1}}
-\newcommand{\ind}[1]{\mathbb{I}(#1)}
-
-\newcommand{\vzero}{\myvecsym{0}}
-\newcommand{\vone}{\myvecsym{1}}
-\newcommand{\valpha}{\myvecsym{\alpha}}
-\newcommand{\vbeta}{\myvecsym{\beta}}
-\newcommand{\vchi}{\myvecsym{\chi}}
-\newcommand{\vdelta}{\myvecsym{\delta}}
-\newcommand{\vDelta}{\myvecsym{\Delta}}
-\newcommand{\vepsilon}{\myvecsym{\epsilon}}
-\newcommand{\vell}{\myvecsym{\ell}}
-\newcommand{\veta}{\myvecsym{\eta}}
-\newcommand{\vgamma}{\myvecsym{\gamma}}
-\newcommand{\vGamma}{\myvecsym{\Gamma}}
-\newcommand{\vmu}{\myvecsym{\mu}}
-\newcommand{\vnu}{\myvecsym{\nu}}
-\newcommand{\vkappa}{\myvecsym{\kappa}}
-\newcommand{\vlambda}{\myvecsym{\lambda}}
-\newcommand{\vLambda}{\myvecsym{\Lambda}}
-\newcommand{\vLambdaBar}{\overline{\vLambda}}
-\newcommand{\vomega}{\myvecsym{\omega}}
-\newcommand{\vOmega}{\myvecsym{\Omega}}
-\newcommand{\vphi}{\myvecsym{\phi}}
-\newcommand{\vPhi}{\myvecsym{\Phi}}
-\newcommand{\vpi}{\myvecsym{\pi}}
-\newcommand{\vpsi}{\myvecsym{\psi}}
-\newcommand{\vPsi}{\myvecsym{\Psi}}
-\newcommand{\vtheta}{\myvecsym{\theta}}
-\newcommand{\vTheta}{\myvecsym{\Theta}}
-\newcommand{\vsigma}{\myvecsym{\sigma}}
-\newcommand{\vSigma}{\myvecsym{\Sigma}}
-\newcommand{\vtau}{\myvecsym{\tau}}
-\newcommand{\vupsilon}{\myvecsym{\upsilon}}
-\newcommand{\vxi}{\myvecsym{\xi}}
-
-\newcommand{\vxn}{\vx^{(n)}}
-
-\newcommand{\vmuY}{\vb}
-\newcommand{\vmuMu}{\vmu_{x}}
-\newcommand{\vmuMuGivenY}{\vmu_{x|y}}
-\newcommand{\vSigmaMu}{\vSigma_{x}}
-\newcommand{\vSigmaMuInv}{\vSigma_{x}^{-1}}
-\newcommand{\vSigmaMuGivenY}{\vSigma_{x|y}}
-\newcommand{\vSigmaMuGivenYinv}{\vSigma_{x|y}^{-1}}
-\newcommand{\vSigmaY}{\vSigma_{y}}
-\newcommand{\vSigmaYinv}{\vSigma_{y}^{-1}}
-
-%\newcommand{\vmuY}{\vmu_{y}}
-%\newcommand{\vmuMu}{\vmu_{\mu}}
-%\newcommand{\vmuMuGivenY}{\vmu_{\mu|y}}
-%\newcommand{\vSigmaMu}{\vSigma_{\mu}}
-%\newcommand{\vSigmaMuInv}{\vSigma_{\mu}^{-1}}
-%\newcommand{\vSigmaMuGivenY}{\vSigma_{\mu|y}}
-%\newcommand{\vSigmaMuGivenYinv}{\vSigma_{\mu|y}^{-1}}
-%\newcommand{\vSigmaY}{\vSigma_{y}}
-%\newcommand{\vSigmaYinv}{\vSigma_{y}^{-1}}
-
-\newcommand{\muY}{\mu_{y}}
-\newcommand{\muMu}{\mu_{\mu}}
-\newcommand{\muMuGivenY}{\mu_{\mu|y}}
-\newcommand{\SigmaMu}{\Sigma_{\mu}}
-\newcommand{\SigmaMuInv}{\Sigma_{\mu}^{-1}}
-\newcommand{\SigmaMuGivenY}{\Sigma_{\mu|y}}
-\newcommand{\SigmaMuGivenYinv}{\Sigma_{\mu|y}^{-1}}
-\newcommand{\SigmaY}{\Sigma_{y}}
-\newcommand{\SigmaYinv}{\Sigma_{y}^{-1}}
-
-\newcommand{\hatf}{\hat{f}}
-\newcommand{\haty}{\hat{y}}
-\newcommand{\const}{\mbox{const}}
-\newcommand{\sigmoid}{\mbox{sigm}}
-
-\newcommand{\one}{(1)}
-\newcommand{\two}{(2)}
-
-\newcommand{\va}{\myvec{a}}
-\newcommand{\vb}{\myvec{b}}
-\newcommand{\vc}{\myvec{c}}
-\newcommand{\vd}{\myvec{d}}
-\newcommand{\ve}{\myvec{e}}
-\newcommand{\vf}{\myvec{f}}
-\newcommand{\vg}{\myvec{g}}
-\newcommand{\vh}{\myvec{h}}
-\newcommand{\vj}{\myvec{j}}
-\newcommand{\vk}{\myvec{k}}
-\newcommand{\vl}{\myvec{l}}
-\newcommand{\vm}{\myvec{m}}
-\newcommand{\vn}{\myvec{n}}
-\newcommand{\vo}{\myvec{o}}
-\newcommand{\vp}{\myvec{p}}
-\newcommand{\vq}{\myvec{q}}
-\newcommand{\vr}{\myvec{r}}
-\newcommand{\vs}{\myvec{s}}
-\newcommand{\vt}{\myvec{t}}
-\newcommand{\vu}{\myvec{u}}
-\newcommand{\vv}{\myvec{v}}
-\newcommand{\vw}{\myvec{w}}
-\newcommand{\vws}{\vw_s}
-\newcommand{\vwh}{\hat{\vw}}
-\newcommand{\vx}{\myvec{x}}
-%\newcommand{\vx}{\myvec{x}}
-\newcommand{\vxt}{\myvec{\tilde{x}}}
-\newcommand{\vy}{\myvec{y}}
-\newcommand{\vyt}{\myvec{\tilde{y}}}
-\newcommand{\vz}{\myvec{z}}
-
-\newcommand{\vA}{\myvec{A}}
-\newcommand{\vB}{\myvec{B}}
-\newcommand{\vC}{\myvec{C}}
-\newcommand{\vD}{\myvec{D}}
-\newcommand{\vE}{\myvec{E}}
-\newcommand{\vF}{\myvec{F}}
-\newcommand{\vG}{\myvec{G}}
-\newcommand{\vH}{\myvec{H}}
-\newcommand{\vI}{\myvec{I}}
-\newcommand{\vJ}{\myvec{J}}
-\newcommand{\vK}{\myvec{K}}
-\newcommand{\vL}{\myvec{L}}
-\newcommand{\vM}{\myvec{M}}
-\newcommand{\vN}{\myvec{N}}
-\newcommand{\vO}{\myvec{O}}
-\newcommand{\vP}{\myvec{P}}
-\newcommand{\vQ}{\myvec{Q}}
-\newcommand{\vR}{\myvec{R}}
-\newcommand{\vS}{\myvec{S}}
-\newcommand{\vT}{\myvec{T}}
-\newcommand{\vU}{\myvec{U}}
-\newcommand{\vV}{\myvec{V}}
-\newcommand{\vW}{\myvec{W}}
-\newcommand{\vX}{\myvec{X}}
-%\newcommand{\vXs}{\vX_{\vs}}
-\newcommand{\vXs}{\vX_{s}}
-\newcommand{\vXt}{\myvec{\tilde{X}}}
-\newcommand{\vY}{\myvec{Y}}
-\newcommand{\vZ}{\myvec{Z}}
-
-
-\newcommand{\vxtest}{\myvec{x}_*}
-\newcommand{\vytest}{\myvec{y}_*}
-
-
-\newcommand{\ftrue}{f_{true}}
-
-\newcommand{\myprec}{\mbox{prec}}
-\newcommand{\precw}{\lambda_{w}} % precision of weights (alpha)
-\newcommand{\precy}{\lambda_{y}} % precision of y (beta)
-\newcommand{\fbar}{\overline{f}}
-\newcommand{\xmybar}{\overline{x}}
-\newcommand{\ybar}{\overline{y}}
-\newcommand{\zbar}{\overline{z}}
-\newcommand{\vxbar}{\overline{\vx}}
-\newcommand{\vXbar}{\overline{\vX}}
-\newcommand{\vybar}{\overline{\vy}}
-\newcommand{\vYbar}{\overline{\vY}}
-\newcommand{\vzbar}{\overline{\vz}}
-\newcommand{\vZbar}{\overline{\vZ}}
-\newcommand{\xbar}{\overline{x}}
-\newcommand{\Xbar}{\overline{X}}
-\newcommand{\Ybar}{\overline{Y}}
-\newcommand{\Gbar}{\overline{G}}
-\newcommand{\Jbar}{\overline{J}}
-\newcommand{\Lbar}{\overline{L}}
-\newcommand{\Nbar}{\overline{N}}
-%\newcommand{\Qbar}{\overline{Q}}
-\newcommand{\Qbar}{\overline{Q}}
-\newcommand{\Tbar}{\overline{T}}
-\newcommand{\Sbar}{\overline{S}}
-\newcommand{\vSbar}{\overline{\vS}}
-\newcommand{\Rbar}{\overline{R}}
-
-\newcommand{\vtaubar}{\overline{\vtau}}
-\newcommand{\vtbar}{\overline{\vt}}
-\newcommand{\vsbar}{\overline{\vs}}
-
-\newcommand{\htilde}{\tilde{h}}
-\newcommand{\vhtilde}{\tilde{\vh}}
-\newcommand{\Dtilde}{\tilde{D}}
-\newcommand{\Ftilde}{\tilde{F}}
-\newcommand{\wtilde}{\tilde{w}}
-\newcommand{\ptilde}{\tilde{p}}
-\newcommand{\pemp}{p_{emp}}
-\newcommand{\pstar}{p^*}
-\newcommand{\xtilde}{\tilde{x}}
-\newcommand{\Xtilde}{\tilde{X}}
-\newcommand{\ytilde}{\tilde{y}}
-\newcommand{\Ytilde}{\tilde{Y}}
-\newcommand{\vxtilde}{\tilde{\vx}}
-\newcommand{\vytilde}{\tilde{\vy}}
-\newcommand{\ztilde}{\tilde{\z}}
-\newcommand{\vztilde}{\tilde{\vz}}
-\newcommand{\vthetaMAP}{\hat{\vtheta}_{MAP}}
-\newcommand{\vthetaS}{\vtheta^{(s)}}
-\newcommand{\vthetahat}{\hat{\vtheta}}
-\newcommand{\thetahat}{\hat{\theta}}
-\newcommand{\thetabar}{\overline{\theta}}
-\newcommand{\vthetabar}{\overline{\vtheta}}
-\newcommand{\pibar}{\overline{\pi}}
-\newcommand{\vpibar}{\overline{\vpi}}
-
-
-%\newcommand{\subsubsubsection}[1]{\paragraph{#1}}
-\newcommand{\choice}[2]{\left(\!\!\! \begin{array}{c} #1 \\ #2\end{array} \!\!\!\right)}
-\newcommand{\half}{\frac{1}{2}}
-\newcommand{\defeq}{\stackrel{\rm def}{=}}
-\newcommand{\real}{\mathbb{R}}
-
-\newcommand{\given}{\|}
-\newcommand{\indep}[2]{{#1} \perp {#2}}
-\newcommand{\condindep}[3]{{#1} \perp {#2} | {#3}}
-\newcommand{\condindepG}[3]{{#1} \perp_G {#2} | {#3}}
-\newcommand{\condindepP}[3]{{#1} \perp_p {#2} | {#3}}
-\newcommand{\depend}[2]{{#1} \not \perp {#2}}
-\newcommand{\conddepend}[3]{{#1} \not \perp {#2} | {#3}}
-
-\newcommand{\trans}[1]{{#1}^{\mathtt{T}}}
-\newcommand{\inv}[1]{{#1}^{-1}}
-
-\newcommand{\ra}{\rightarrow}
-\newcommand{\lra}{\leftrightarrow}
-\newcommand{\Ra}{\Rightarrow}
-%\newcommand{\rv}{r.v.}
-\newcommand{\la}{\leftarrow}
-\newcommand{\tr}{\mbox{tr}}
-\newcommand{\st}{\mbox{  s.t.  }}
-
-\newcommand{\dom}{\mbox{dom}}
-\newcommand{\bel}{\mbox{bel}}
-\newcommand{\dsep}{\mbox{dsep}}
-\newcommand{\sep}{\mbox{sep}}
-\newcommand{\entails}{\models}
-\newcommand{\range}{\mbox{range}}
-\newcommand{\myspan}{\mbox{span}}
-\newcommand{\nullspace}{\mbox{nullspace}}
-\newcommand{\adj}{\mbox{adj}}
-
-\newcommand{\nbd}{\mbox{nbd}}
-\newcommand{\nbr}{\mbox{nbr}}
-\newcommand{\anc}{\mbox{anc}}
-\newcommand{\desc}{\mbox{desc}}
-\newcommand{\pred}{\mbox{pred}}
-\newcommand{\nondesc}{\mbox{nondesc}}
-\newcommand{\pa}{\pi}
-\newcommand{\ch}{\mbox{ch}}
-\newcommand{\mb}{\mbox{mb}}
-\newcommand{\connects}{\sim}
-
-
-\newcommand{\betadist}{\mbox{Beta}}
-\newcommand{\Betadist}{\mbox{Beta}}
-\newcommand{\bernoulli}{\mbox{Ber}}
-\newcommand{\Ber}{\mbox{Ber}}
-\newcommand{\Binom}{\mbox{Bin}}
-\newcommand{\NegBinom}{\mbox{NegBinom}}
-\newcommand{\binomdist}{\mbox{Bin}}
-\newcommand{\cauchy}{\mbox{Cauchy}}
-\newcommand{\DE}{\mbox{DE}}
-\newcommand{\Dir}{\mbox{Dir}}
-\newcommand{\discrete}{\calM}
-\newcommand{\Discrete}{\calM}
-\newcommand{\expdist}{\mbox{Exp}}
-\newcommand{\expon}{\mbox{Expon}}
-\newcommand{\gammadist}{\mbox{Ga}}
-\newcommand{\Ga}{\mbox{Ga}}
-\newcommand{\gauss}{{\cal N}}
-\newcommand{\IG}{\mbox{IG}}
-\newcommand{\IGauss}{\mbox{InvGauss}}
-\newcommand{\IW}{\mbox{IW}}
-\newcommand{\Laplace}{\mbox{Lap}}
-\newcommand{\Mu}{\mbox{Mu}}
-\newcommand{\Multi}{\mbox{Mu}}
-\newcommand{\NIX}{NI\chi^2}
-\newcommand{\GIX}{NI\chi^2}
-\newcommand{\NIG}{\mbox{NIG}}
-\newcommand{\GIG}{\mbox{NIG}}
-\newcommand{\NIW}{\mbox{NIW}}
-\newcommand{\GIW}{\mbox{NIW}}
-\newcommand{\MVNIW}{\mbox{NIW}}
-\newcommand{\NW}{\mbox{NWi}}
-\newcommand{\MVNIG}{\mbox{NIG}}
-\newcommand{\NGdist}{\mbox{NG}}
-\newcommand{\prob}{p}
-\newcommand{\Poi}{\mbox{Poi}}
-\newcommand{\Student}{{\cal T}}
-\newcommand{\student}{{\cal T}}
-\newcommand{\Wishart}{\mbox{Wi}}
-\newcommand{\Wi}{\mbox{Wi}}
-\newcommand{\unif}{\mbox{U}}
-\newcommand{\etr}{\mbox{etr}}
-
-\newcommand{\softmax}{\calS}
-\newcommand{\soft}{\mbox{soft}}
-\newcommand{\cond}{\mbox{cond}}
-\newcommand{\sign}{\mbox{sign}}
-\newcommand{\sgn}{\mbox{sgn}}
-\newcommand{\iid}{\mbox{iid}}
-\newcommand{\mle}{\mbox{mle}}
-\newcommand{\myiff}{\mbox{iff}}
-\newcommand{\pd}{\mbox{pd}}
-\newcommand{\pdf}{\mbox{pdf }}
-\newcommand{\cdf}{\mbox{cdf}}
-\newcommand{\pmf}{\mbox{pmf}}
-\newcommand{\wrt}{\mbox{wrt}}
-\newcommand{\matlab}{{\sc MATLAB}}
-\newcommand{\NETLAB}{{\sc NETLAB}}
-\newcommand{\MLABA}{\mbox{PMTK}}
-\newcommand{\BLT}{\mbox{PMTK}}
-\newcommand{\PMTK}{\mbox{PMTK}}
-\newcommand{\mywp}{\mbox{wp}}
-
-\newcommand{\KLpq}[2]{\mathrm{KL}\left[{#1}\|{#2}\right]}
-\newcommand{\KL}{\mbox{KL}}
-\newcommand{\MI}{\mathbb{I}}
-\newcommand{\MIxy}[2]{\mathbb{I}\left({#1};{#2}\right)}
-\newcommand{\MIxyz}[3]{\mathbb{I}\left({#1};{#2}|{#3}\right)}
-\newcommand{\entropy}[1]{\mathbb{H}\left({#1}\right)}
-\newcommand{\entropypq}[2]{\mathbb{H}\left({#1}, {#2}\right)}
-
-
-\newcommand{\vvec}{\mbox{vec}}
-\newcommand{\kron}{\otimes}
-\newcommand{\dof}{\mbox{dof}}
-%\newcommand{\E}{E}
-\newcommand{\E}{\mathbb{E}}
-\newcommand{\energy}{E}
-\newcommand{\expectAngle}[1]{\langle #1 \rangle}
-%\newcommand{\expect}[1]{\mathbb{E}\left[ {#1} \right]}
-\newcommand{\expect}[2]{\mathds{E}_{{#1}} \left[ {#2} \right]}
-\newcommand{\expectGiven}[3]{\mathds{E}_{{#1}} \left[ {#2} \mid {#3} \right]}
-\newcommand{\Var}{\mbox{Var}}
-\newcommand{\VarGiven}[3]{\mbox{Var}_{{#1}}\left[ {#2} \mid {#3}\right]}
-%\newcommand{\Var}{\mathbb{V}}
-\newcommand{\var}[1]{\mbox{var}\left[{#1}\right]}
-\newcommand{\std}[1]{\mbox{std}\left[{#1}\right]}
-\newcommand{\varQ}[2]{\mbox{var}_{{#2}}\left[{#1}\right]}
-\newcommand{\cov}[1]{\mbox{cov}\left[{#1}\right]}
-%\newcommand{\mode}[1]{\mbox{mode}\left[{#1}\right]}
-\newcommand{\median}[1]{\mbox{median}\left[{#1}\right]}
-
-
-
-\newcommand{\diag}{\mbox{diag}}
-\newcommand{\blkdiag}{\mbox{blkdiag}}
-\newcommand{\bias}{\mbox{bias}}
-\newcommand{\union}{\cup}
-\newcommand{\intersect}{\cap}
-
-\newcommand{\size}{\mbox{size}}
-\newcommand{\trace}{\mbox{trace}}
-
-
-\newcommand{\myc}{c}
-\newcommand{\myi}{i}
-\newcommand{\myj}{j}
-\newcommand{\myk}{k}
-\newcommand{\myn}{n}
-\newcommand{\myq}{q}
-\newcommand{\mys}{s}
-\newcommand{\myt}{t}
-
-
-\newcommand{\supp}{\mbox{supp}}
-
-
-\newcommand{\calA}{{\cal A}}
-\newcommand{\calB}{{\cal B}}
-\newcommand{\calC}{{\cal C}}
-\newcommand{\calD}{{\cal D}}
-\newcommand{\calDx}{{\cal D}_x}
-\newcommand{\calE}{{\cal E}}
-\newcommand{\cale}{{\cal e}}
-\newcommand{\calF}{{\cal F}}
-\newcommand{\calG}{{\cal G}}
-\newcommand{\calH}{{\cal H}}
-\newcommand{\calHX}{{\cal H}_X}
-\newcommand{\calHy}{{\cal H}_y}
-\newcommand{\calI}{{\cal I}}
-\newcommand{\calK}{{\cal K}}
-\newcommand{\calM}{{\cal M}}
-\newcommand{\calN}{{\cal N}}
-\newcommand{\caln}{{\cal n}}
-\newcommand{\calNP}{{\cal NP}}
-\newcommand{\calMp}{\calM^+}
-\newcommand{\calMm}{\calM^-}
-\newcommand{\calMo}{\calM^o}
-\newcommand{\Ctest}{C_*}
-\newcommand{\calL}{{\cal L}}
-\newcommand{\calP}{{\cal P}}
-\newcommand{\calq}{{\cal q}}
-\newcommand{\calQ}{{\cal Q}}
-\newcommand{\calR}{{\cal R}}
-\newcommand{\calS}{{\cal S}}
-\newcommand{\calSstar}{\calS_*}
-\newcommand{\calT}{{\cal T}}
-\newcommand{\calV}{{\cal V}}
-\newcommand{\calv}{{\cal v}}
-\newcommand{\calX}{{\cal X}}
-\newcommand{\calY}{{\cal Y}}
-
-\newcommand{\Lone}{$\ell_1$}
-\newcommand{\Ltwo}{$\ell_2$}
-
-\newcommand{\mya}{\mbox{a}}
-\newcommand{\myat}{\alpha_{t|t-1}}
-\newcommand{\score}{\mbox{score}}
-\newcommand{\AIC}{\mbox{AIC}}
-\newcommand{\BIC}{\mbox{BIC}}
-\newcommand{\BICcost}{\mbox{BIC-cost}}
-\newcommand{\scoreBIC}{\mbox{score-BIC}}
-\newcommand{\scoreBICL}{\mbox{score-BIC-L1}}
-\newcommand{\scoreL}{\mbox{score-L1}}
-
-\newcommand{\ecoli}{\mbox{{\it E. coli}}}
-\newcommand{\doPearl}{\mbox{do}}
-\newcommand{\data}{\calD}
-\newcommand{\model}{\calM}
-\newcommand{\dataTrain}{\calD_{\mbox{train}}}
-\newcommand{\dataTest}{\calD_{\mbox{test}}}
-\newcommand{\dataValid}{\calD_{\mbox{valid}}}
-\newcommand{\futuredata}{\tilde{\calD}}
-\newcommand{\algo}{\calA}
-\newcommand{\fitAlgo}{\calF}
-\newcommand{\predictAlgo}{\calP}
-\newcommand{\err}{\mbox{err}}
-\newcommand{\logit}{\mbox{logit}}
-\newcommand{\parent}{\mbox{pa}}
-
-
-%%%%%%%%%%% Hoyt
-
-\newcommand{\conv}[1]{\,\,\,\displaystyle{\operatorname*{\longrightarrow}^{\,_{#1}\,}}\,\,\,}
-\newcommand{\dconv}{\conv{D}}
-\newcommand{\pconv}{\conv{P}}
-\newcommand{\asconv}{\conv{AS}}
-\newcommand{\lpconv}[1]{\conv{L^{#1}}}
-
-\DeclareMathAlphabet{\mathpzc}{OT1}{pzc}{m}{n}
-
-
-\newcommand{\condSet}{\mathcal{S}}
-\newcommand{\condSetC}{\mathcal{\lnot S}}
-
diff --git a/app/scripts/latex-to-mdx/input/hfstyle/hf.cls b/app/scripts/latex-to-mdx/input/hfstyle/hf.cls
deleted file mode 100644
index 4c654160d8711db05b1010300542e28a6396008e..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/hfstyle/hf.cls
+++ /dev/null
@@ -1,360 +0,0 @@
-% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% A style for AI2 pre-prints
-% Author: jacobm@allenai.org
-% Version: 1.1
-% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-% Class declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\NeedsTeXFormat{LaTeX2e}
-\ProvidesClass{hfstyle/hf}
-\LoadClassWithOptions{article}
-
-% Layout %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\RequirePackage[top=2.25cm, bottom=2.5cm, left=2.5cm, right=2.5cm, columnsep=0.65cm, margin=1.9cm]{geometry}
-\RequirePackage{microtype}
-\RequirePackage{placeins}
-\RequirePackage{hyphenat}
-\RequirePackage{setspace}
-\RequirePackage{parskip}
-\RequirePackage[latin, english]{babel}
-\RequirePackage{lipsum}
-\RequirePackage{etoolbox}
-\RequirePackage{fancyhdr}  % custom headers/footers
-
-% \DisableLigatures[f]{family=sf*} 
-
-% Graphics %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\RequirePackage{graphicx}
-\RequirePackage{subcaption}
-
-% Tables %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\RequirePackage{booktabs}
-\RequirePackage{nicematrix}
-\RequirePackage{multirow}
-\RequirePackage{bm}
-\newcommand{\nm}[1]{#1}
-
-% Colorful stuff %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\RequirePackageWithOptions{xcolor}
-\RequirePackage[most]{tcolorbox}
-\definecolor{ai2accent}{HTML}{407579}
-% \definecolor{ai2accent}{HTML}{ff0000}
-\definecolor{hfforeground}{HTML}{1C2B33}
-\definecolor{hfbackground}{HTML}{ffffb7}
-\definecolor{hfforegroundDark}{HTML}{0A2B35}
-\definecolor{ai2pink}{HTML}{F0529C}
-\definecolor{hfyellow}{HTML}{000000}
-
-
-% References %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\RequirePackage{hyperref}
-\hypersetup{
-  colorlinks=true,
-  linkcolor=ai2accent,
-  citecolor=ai2accent,
-  urlcolor=ai2accent,
-  anchorcolor=ai2accent,
-  menucolor=ai2accent,
-  filecolor=ai2accent,
-  % linktocpage=true,
-  allcolors=ai2accent
-}
-
-
-\RequirePackage[noabbrev,nameinlink]{cleveref}
-% Reapply hyperref settings after cleveref to ensure they stick
-\AtBeginDocument{
-  \hypersetup{
-    allcolors=ai2accent,
-    linkcolor=ai2accent,
-    citecolor=ai2accent,
-    urlcolor=ai2accent
-  }
-}
-
-
-% change base color of text 
-\AtBeginDocument{
-  \color{hfforegroundDark}
-  \pagecolor{white}
-}
-
-
-\RequirePackage[round,authoryear]{natbib}
-\def\bibfont{\small}
-
-
-% Create a custom size that's exactly 1pt larger than normalsize
-\newcommand{\slightlylarger}{%
-  \fontsize{\dimexpr\f@size pt+1}{\dimexpr\f@size pt-0.2\baselineskip}\selectfont%
-}
-
-% Create a custom size that's exactly 1pt smaller than normalsize
-\newcommand{\slightlysmaller}{%
-  \fontsize{\dimexpr\f@size pt-1}{\dimexpr\f@size pt+0.2\baselineskip}\selectfont%
-}
-
-% Section and caption format %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\RequirePackage{titlesec}
-% \titleformat*{\paragraph}{\bfseries}
-\titleformat*{\section}{\Large\sffamily\bfseries}
-\titleformat*{\subsection}{\large\sffamily\bfseries}
-\titleformat*{\subsubsection}{\slightlylarger\sffamily\bfseries}
-\titleformat*{\paragraph}{\slightlysmaller\sffamily\bfseries}
-
-% make bolded text smaller to match with serif.
-% \DeclareTextFontCommand{\textbf}{\bfseries\sffamily}
-\DeclareTextFontCommand{\textbf}{\fontsize{9}{11}\selectfont\bfseries\sffamily}
-
-
-\RequirePackage{caption}
-\DeclareCaptionLabelSeparator{custom}{}
-\DeclareCaptionFormat{custom}{{\sffamily\textbf{#1 #2}} #3}
-\DeclareCaptionLabelSeparator{pipe}{ $\vert$ }% or $\vert$
-\captionsetup{singlelinecheck=false,format=custom,labelsep=pipe,font=small}
-\captionsetup[sub]{singlelinecheck=true,format=custom,labelsep=pipe,font=small}
-
-% %%======== Header and Footer Content ========
-
-
-
-% % HF custom fonts %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-% % http://c.caignaert.free.fr/Install-ttf-Font.pdf
-
-% Set the main font to Times New Roman
-% \setmainfont{Manrope}
-% Set sans-serif font to Manrope
-\RequirePackage{ifxetex}
-\usepackage{ifxetex}
-\ifxetex
-    \usepackage{fontspec}
-    \setsansfont{Manrope}
-\else
-    \RequirePackage[T1]{fontenc}
-    \usepackage[T1]{fontenc}
-    \usepackage{hfstyle/manrope}
-    \renewcommand{\sfdefault}{manrope}
-\fi
-
-% % \pdfmapline{+optimistic < assets/Optimistic.ttf <T1-WGL4.enc}
-% % \renewcommand{\sfdefault}{optimistic}
-
-% \DeclareFontFamily{T1}{manrope}{}
-% \DeclareFontShape{T1}{manrope}{m}{n}{<-> s * [0.88] ai2style/fonts/Manrope}{}
-% \pdfmapline{+manrope < ai2style/fonts/Manrope.ttf <T1-WGL4.enc}
-% \renewcommand{\sfdefault}{manrope}
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% Imporve appearance of ToC
-
-\RequirePackage{tocloft}
-
-% Make sure the section titles in TOC are in Manrope
-\renewcommand{\cfttoctitlefont}{\sffamily}
-\renewcommand{\cftsecfont}{\sffamily}
-\renewcommand{\cftsubsecfont}{\sffamily}
-\renewcommand{\cftsubsubsecfont}{\sffamily}
-\renewcommand{\cftparafont}{\sffamily}
-
-% Make the page numbers in TOC use Manrope
-\renewcommand{\cftsecpagefont}{\sffamily}
-\renewcommand{\cftsubsecpagefont}{\sffamily}
-\renewcommand{\cftsubsubsecpagefont}{\sffamily}
-\renewcommand{\cftparapagefont}{\sffamily}
-
-% % Adjust the spacing
-% \setlength{\cftsecindent}{1.5em}
-% \setlength{\cftsubsecindent}{3.0em}
-% \setlength{\cftsubsubsecindent}{4.5em}
-% \setlength{\cftparaindent}{6.0em}
-
-% \setlength{\cftsecnumwidth}{2.5em}
-% \setlength{\cftsubsecnumwidth}{3.5em}
-% \setlength{\cftsubsubsecnumwidth}{4.5em}
-% \setlength{\cftparanumwidth}{5.5em}
-
-% % Adjust vertical spacing
-% \setlength{\cftbeforesecskip}{5pt plus 1pt}
-% \setlength{\cftbeforesubsecskip}{2pt plus 1pt}
-% \setlength{\cftbeforesubsubsecskip}{2pt plus 1pt}
-
-% % Style the dots
-% \renewcommand{\cftsecdotsep}{1.5}
-% \renewcommand{\cftsubsecdotsep}{1.5}
-% \renewcommand{\cftsubsubsecdotsep}{1.5}
-% \renewcommand{\cftparadotsep}{1.5}
-
-% % Color for the dots (using the pre-defined ai2accent color)
-% \renewcommand{\cftsecleader}{\color{ai2accent}\cftsecdotsep}
-% \renewcommand{\cftsubsecleader}{\color{ai2accent}\cftsubsecdotsep}
-% \renewcommand{\cftsubsubsecleader}{\color{ai2accent}\cftsubsubsecdotsep}
-% \renewcommand{\cftparaleader}{\color{ai2accent}\cftparadotsep}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-% KYLE TESTING SOME STUFF OUT
-\newcommand\authorlistOne{}
-\newcommand\authorlistTwo{}
-\newcommand\authorlistThree{}
-\newcommand\authorlistFour{}
-\newcommand\authorlistFive{}
-
-
-\newcommand\authorOne[2][]{\addtolist[#1]{#2}{\authorlistOne}{\authorformat}{\quad }}
-\newcommand\authorTwo[2][]{\addtolist[#1]{#2}{\authorlistTwo}{\authorformat}{\quad }}
-\newcommand\authorThree[2][]{\addtolist[#1]{#2}{\authorlistThree}{\authorformat}{\quad }}
-\newcommand\authorFour[2][]{\addtolist[#1]{#2}{\authorlistFour}{\authorformat}{\quad }}
-\newcommand\authorFive[2][]{\addtolist[#1]{#2}{\authorlistFive}{\authorformat}{\quad }}
-
-
-
-
-
-
-% Authors, affiliations, infos %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\renewcommand\addtolist[5][]{
-  \begingroup
-    \if\relax#3\relax\def\sep{}\else\def\sep{#5}\fi
-    \let\protect\@unexpandable@protect 
-    \xdef#3{\expandafter{#3}\sep #4[#1]{#2}}%
-  \endgroup
-}
-
-\newcommand\authorlist{}
-\newcommand\authorformat[2][]{{\sffamily \bfseries #2$^{#1}$}}
-% \renewcommand\author[2][]{\addtolist[#1]{#2}{\authorlist}{\authorformat}{, }}
-\renewcommand\author[2][]{\addtolist[#1]{#2}{\authorlist}{\authorformat}{\quad}}
-
-\newcommand\affiliationlist{}
-\newcommand\affiliationformat[2][]{{\normalsize $^{#1}$#2}}
-\newcommand\affiliation[2][]{\addtolist[#1]{#2}{\affiliationlist}{\affiliationformat}{, }}
-
-\newcommand\contributionlist{}
-\newcommand\contributionformat[2][]{{\small $^{#1}$#2}}
-\newcommand\contribution[2][]{\addtolist[#1]{#2}{\contributionlist}{\contributionformat}{, }}
-
-\newcommand\metadatalist{}
-\newcommand\metadataformat[2][]{{\small {\sffamily \bfseries #1} #2}}
-\newcommand\metadata[2][]{\addtolist[#1]{#2}{\metadatalist}{\metadataformat}{\par}}
-
-% switch to sans serif
-\newcommand\sans[1][]{\sffamily #1}
-
-\renewcommand{\abstract}[1]{\def\abstractlist{{\color{hfforeground} #1}}}
-\newcommand{\email}[1]{\href{mailto:#1}{\texttt{#1}}}
-
-\renewcommand\date[1]{\metadata[Date]{#1}}
-\newcommand\correspondence[1]{\metadata[Correspondence]{#1}}
-
-
-\renewcommand{\title}[1]{\newcommand{\titlelist}{{\Huge\sffamily\bfseries\color{hfyellow} #1}}}
-
-\newcommand{\beginappendix}{\appendix{\huge\sffamily Appendix\par}}
-
-% Titlebox %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\newcommand{\mainfigure}{}  % New command to store the figure
-\newcommand{\maintable}{}   % New command to store the table
-
-\newcommand{\setmainfigure}[1]{%
-  \renewcommand{\mainfigure}{#1}%
-}
-
-\newcommand{\setmaintable}[1]{%
-  \renewcommand{\maintable}{#1}%
-}
-
-\newcommand{\mymaketitle}{%
-  \tcbset{enhanced,frame hidden}
-  \tcbset{borderline={0.5mm}{0mm}{yellow!70}}
-  %\tcbset{colframe=yellow!40!orange}
-  %\tcbset{opacityframe=1.0}
-  %\tcbset{boxrule=0.5mm,}
-  \tcbset{left=0.5cm}
-  \tcbset{right=0.5cm}
-  \tcbset{top=0.5cm}
-  \tcbset{bottom=0.5cm}
-  \tcbset{arc=12pt}
-  \tcbset{colback=hfbackground, opacityback=0.65}
-  \tcbset{before skip=0pt}
-  \tcbset{grow to left by=1.5pt}
-  \tcbset{grow to right by=1.5pt}
-  \tcbset{overlay={\node[
-    anchor=north east,
-    at= (frame.north east),
-    xshift=-0.5cm,
-    yshift=-0.25cm] {\includegraphics[width=3.57cm]{hfstyle/assets/huggingface.pdf}};}}
-  
-  \setlength{\parskip}{0cm}
-  \nohyphens
-  {
-    \setstretch{1.618}
-    \titlelist\par
-  }
-  \vskip 0.5cm
-    \setlength{\parindent}{0cm}
-    \setlength{\parskip}{0cm}
-    {
-      \authorlistOne \par
-      \vspace{0.3cm}
-      \authorlistTwo\par
-      \vspace{0.3cm}
-      \authorlistThree\par
-      % \vspace{0.3cm}  
-      % \authorlistFour\par
-      % \vspace{0.3cm}  
-      % \authorlistFive\par
-      % \vspace{0.3cm}
-      \affiliationlist\par
-      \vspace{0.3cm}
-      \contributionlist\par
-      \vspace{0.5cm}
-      \ifdefempty{\metadatalist}{}{\metadatalist}
-    }
-    
-    % Insert the figure here
-    \ifdefempty{\mainfigure}{}{%
-      % \vspace{0.5cm}
-      \mainfigure
-      % \vspace{0.5cm}
-    }
-    
-  \begin{tcolorbox}
-    {\Large\sffamily\bfseries Abstract\par}
-    \vspace{0.5cm}
-    \abstractlist\par
-  \end{tcolorbox}
-  
-  % Insert the table after the abstract
-  \ifdefempty{\maintable}{}{%
-    % \vspace{0.5cm}
-    \maintable
-    \vspace{0.5cm}
-  }
-  
-  \tcbset{reset}
-  \FloatBarrier
-  \setlength{\parskip}{5.5\p@}
-}
-
-\if@twocolumn%
-\renewcommand{\maketitle}{
-\twocolumn[%
-  \mymaketitle
-  \vskip 0.38cm
-]
-}%
-\else%
-\renewcommand{\maketitle}{
-  \mymaketitle
-}%
-\fi
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/hfstyle/manrope.sty b/app/scripts/latex-to-mdx/input/hfstyle/manrope.sty
deleted file mode 100644
index 1e5117f7284acb17a236ed94f1ccaf4037275397..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/hfstyle/manrope.sty
+++ /dev/null
@@ -1,28 +0,0 @@
-\NeedsTeXFormat{LaTeX2e}
-\ProvidesPackage{hfstyle/manrope}[Provides the manrope font]
-
-% Write the fd file
-\begin{filecontents}{t1manrope.fd}
-\ProvidesFile{t1manrope.fd}[Font definitions for T1/manrope.]
- 
-\DeclareFontFamily{T1}{manrope}{}
-\DeclareFontShape{T1}{manrope}{m} {n} {<-> manroperegular } {}
-\DeclareFontShape{T1}{manrope}{b} {n} {<-> manropebold } {}
-
-\DeclareFontShape{T1}{manrope}{m} {it}{<-> ssub * manrope/m/n} {}
-\DeclareFontShape{T1}{manrope}{b} {it}{<-> ssub * manrope/b/n} {}
-
-\DeclareFontShape{T1}{manrope}{m} {sc}{<-> ssub * manrope/m/n} {}
-\DeclareFontShape{T1}{manrope}{b} {sc}{<-> ssub * manrope/b/n} {}
-
-\DeclareFontShape{T1}{manrope}{m} {sl}{<-> ssub * manrope/m/n} {}
-\DeclareFontShape{T1}{manrope}{b} {sl}{<-> ssub * manrope/b/n} {}
-\end{filecontents}
-
-% Write the map lines
-\pdfmapline{+manroperegular < hfstyle/manrope/Manrope-Regular.ttf <T1-WGL4.enc}
-\pdfmapline{+manropebold < hfstyle/manrope/Manrope-Bold.ttf <T1-WGL4.enc}
-
-
-
-\endinput
diff --git a/app/scripts/latex-to-mdx/input/hfstyle/manrope/Manrope-Bold.ttf b/app/scripts/latex-to-mdx/input/hfstyle/manrope/Manrope-Bold.ttf
deleted file mode 100644
index 98c1c3d5b6f7b5452964b591fc4a0411ac2b5722..0000000000000000000000000000000000000000
Binary files a/app/scripts/latex-to-mdx/input/hfstyle/manrope/Manrope-Bold.ttf and /dev/null differ
diff --git a/app/scripts/latex-to-mdx/input/hfstyle/manrope/Manrope-Regular.ttf b/app/scripts/latex-to-mdx/input/hfstyle/manrope/Manrope-Regular.ttf
deleted file mode 100644
index 1a072330a807ba9f247b6b51e234d8ac7c7c9ef8..0000000000000000000000000000000000000000
Binary files a/app/scripts/latex-to-mdx/input/hfstyle/manrope/Manrope-Regular.ttf and /dev/null differ
diff --git a/app/scripts/latex-to-mdx/input/hfstyle/plainnat.bst b/app/scripts/latex-to-mdx/input/hfstyle/plainnat.bst
deleted file mode 100644
index ae2e8efd41807b8e51b802fba07f731fa14b3146..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/hfstyle/plainnat.bst
+++ /dev/null
@@ -1,1436 +0,0 @@
-%% File: `plainnat.bst'
-%% A modification of `plain.bst' for use with natbib package 
-%%
-%% Copyright 1993-2007 Patrick W Daly
-%% Max-Planck-Institut f\"ur Sonnensystemforschung
-%% Max-Planck-Str. 2
-%% D-37191 Katlenburg-Lindau
-%% Germany
-%% E-mail: daly@mps.mpg.de
-%%
-%% This program can be redistributed and/or modified under the terms
-%% of the LaTeX Project Public License Distributed from CTAN
-%% archives in directory macros/latex/base/lppl.txt; either
-%% version 1 of the License, or any later version.
-%%
- % Version and source file information:
- % \ProvidesFile{natbst.mbs}[2007/11/26 1.93 (PWD)]
- %
- % BibTeX `plainnat' family
- %   version 0.99b for BibTeX versions 0.99a or later,
- %   for LaTeX versions 2.09 and 2e.
- %
- % For use with the `natbib.sty' package; emulates the corresponding
- %   member of the `plain' family, but with author-year citations.
- %
- % With version 6.0 of `natbib.sty', it may also be used for numerical
- %   citations, while retaining the commands \citeauthor, \citefullauthor,
- %   and \citeyear to print the corresponding information.
- %
- % For version 7.0 of `natbib.sty', the KEY field replaces missing
- %   authors/editors, and the date is left blank in \bibitem.
- %
- % Includes field EID for the sequence/citation number of electronic journals
- %  which is used instead of page numbers.
- %
- % Includes fields ISBN and ISSN.
- %
- % Includes field URL for Internet addresses.
- %
- % Includes field DOI for Digital Object Idenfifiers.
- %
- % Works best with the url.sty package of Donald Arseneau.
- %
- % Works with identical authors and year are further sorted by
- %   citation key, to preserve any natural sequence.
- %
-ENTRY
-  { address
-    author
-    booktitle
-    chapter
-    doi
-    eid
-    edition
-    editor
-    howpublished
-    institution
-    isbn
-    issn
-    journal
-    key
-    month
-    note
-    number
-    organization
-    pages
-    publisher
-    school
-    series
-    title
-    type
-    url
-    volume
-    year
-  }
-  {}
-  { label extra.label sort.label short.list }
-
-INTEGERS { output.state before.all mid.sentence after.sentence after.block }
-
-FUNCTION {init.state.consts}
-{ #0 'before.all :=
-  #1 'mid.sentence :=
-  #2 'after.sentence :=
-  #3 'after.block :=
-}
-
-STRINGS { s t }
-
-FUNCTION {output.nonnull}
-{ 's :=
-  output.state mid.sentence =
-    { ", " * write$ }
-    { output.state after.block =
-        { add.period$ write$
-          newline$
-          "\newblock " write$
-        }
-        { output.state before.all =
-            'write$
-            { add.period$ " " * write$ }
-          if$
-        }
-      if$
-      mid.sentence 'output.state :=
-    }
-  if$
-  s
-}
-
-FUNCTION {output}
-{ duplicate$ empty$
-    'pop$
-    'output.nonnull
-  if$
-}
-
-FUNCTION {output.check}
-{ 't :=
-  duplicate$ empty$
-    { pop$ "empty " t * " in " * cite$ * warning$ }
-    'output.nonnull
-  if$
-}
-
-FUNCTION {fin.entry}
-{ add.period$
-  write$
-  newline$
-}
-
-FUNCTION {new.block}
-{ output.state before.all =
-    'skip$
-    { after.block 'output.state := }
-  if$
-}
-
-FUNCTION {new.sentence}
-{ output.state after.block =
-    'skip$
-    { output.state before.all =
-        'skip$
-        { after.sentence 'output.state := }
-      if$
-    }
-  if$
-}
-
-FUNCTION {not}
-{   { #0 }
-    { #1 }
-  if$
-}
-
-FUNCTION {and}
-{   'skip$
-    { pop$ #0 }
-  if$
-}
-
-FUNCTION {or}
-{   { pop$ #1 }
-    'skip$
-  if$
-}
-
-FUNCTION {new.block.checka}
-{ empty$
-    'skip$
-    'new.block
-  if$
-}
-
-FUNCTION {new.block.checkb}
-{ empty$
-  swap$ empty$
-  and
-    'skip$
-    'new.block
-  if$
-}
-
-FUNCTION {new.sentence.checka}
-{ empty$
-    'skip$
-    'new.sentence
-  if$
-}
-
-FUNCTION {new.sentence.checkb}
-{ empty$
-  swap$ empty$
-  and
-    'skip$
-    'new.sentence
-  if$
-}
-
-FUNCTION {field.or.null}
-{ duplicate$ empty$
-    { pop$ "" }
-    'skip$
-  if$
-}
-
-FUNCTION {emphasize}
-{ duplicate$ empty$
-    { pop$ "" }
-    { "\emph{" swap$ * "}" * }
-  if$
-}
-
-INTEGERS { nameptr namesleft numnames }
-
-FUNCTION {format.names}
-{ 's :=
-  #1 'nameptr :=
-  s num.names$ 'numnames :=
-  numnames 'namesleft :=
-    { namesleft #0 > }
-    { s nameptr "{ff~}{vv~}{ll}{, jj}" format.name$ 't :=
-      nameptr #1 >
-        { namesleft #1 >
-            { ", " * t * }
-            { numnames #2 >
-                { "," * }
-                'skip$
-              if$
-              t "others" =
-                { " et~al." * }
-                { " and " * t * }
-              if$
-            }
-          if$
-        }
-        't
-      if$
-      nameptr #1 + 'nameptr :=
-      namesleft #1 - 'namesleft :=
-    }
-  while$
-}
-
-FUNCTION {format.key}
-{ empty$
-    { key field.or.null }
-    { "" }
-  if$
-}
-
-FUNCTION {format.authors}
-{ author empty$
-    { "" }
-    { author format.names }
-  if$
-}
-
-FUNCTION {format.editors}
-{ editor empty$
-    { "" }
-    { editor format.names
-      editor num.names$ #1 >
-        { ", editors" * }
-        { ", editor" * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.isbn}
-{ isbn empty$
-    { "" }
-    { new.block "ISBN " isbn * }
-  if$
-}
-
-FUNCTION {format.issn}
-{ issn empty$
-    { "" }
-    { new.block "ISSN " issn * }
-  if$
-}
-
-FUNCTION {format.url}
-{ url empty$
-    { "" }
-    { new.block "\url{" url * "}" * }
-  if$
-}
-
-FUNCTION {format.doi}
-{ doi empty$
-    { "" }
-    { new.block "\doi{" doi * "}" * }
-  if$
-}
-
-FUNCTION {format.title}
-{ title empty$
-    { "" }
-    { title "t" change.case$ }
-  if$
-}
-
-FUNCTION {format.full.names}
-{'s :=
-  #1 'nameptr :=
-  s num.names$ 'numnames :=
-  numnames 'namesleft :=
-    { namesleft #0 > }
-    { s nameptr
-      "{vv~}{ll}" format.name$ 't :=
-      nameptr #1 >
-        {
-          namesleft #1 >
-            { ", " * t * }
-            {
-              numnames #2 >
-                { "," * }
-                'skip$
-              if$
-              t "others" =
-                { " et~al." * }
-                { " and " * t * }
-              if$
-            }
-          if$
-        }
-        't
-      if$
-      nameptr #1 + 'nameptr :=
-      namesleft #1 - 'namesleft :=
-    }
-  while$
-}
-
-FUNCTION {author.editor.full}
-{ author empty$
-    { editor empty$
-        { "" }
-        { editor format.full.names }
-      if$
-    }
-    { author format.full.names }
-  if$
-}
-
-FUNCTION {author.full}
-{ author empty$
-    { "" }
-    { author format.full.names }
-  if$
-}
-
-FUNCTION {editor.full}
-{ editor empty$
-    { "" }
-    { editor format.full.names }
-  if$
-}
-
-FUNCTION {make.full.names}
-{ type$ "book" =
-  type$ "inbook" =
-  or
-    'author.editor.full
-    { type$ "proceedings" =
-        'editor.full
-        'author.full
-      if$
-    }
-  if$
-}
-
-FUNCTION {output.bibitem}
-{ newline$
-  "\bibitem[" write$
-  label write$
-  ")" make.full.names duplicate$ short.list =
-     { pop$ }
-     { * }
-   if$
-  "]{" * write$
-  cite$ write$
-  "}" write$
-  newline$
-  ""
-  before.all 'output.state :=
-}
-
-FUNCTION {n.dashify}
-{ 't :=
-  ""
-    { t empty$ not }
-    { t #1 #1 substring$ "-" =
-        { t #1 #2 substring$ "--" = not
-            { "--" *
-              t #2 global.max$ substring$ 't :=
-            }
-            {   { t #1 #1 substring$ "-" = }
-                { "-" *
-                  t #2 global.max$ substring$ 't :=
-                }
-              while$
-            }
-          if$
-        }
-        { t #1 #1 substring$ *
-          t #2 global.max$ substring$ 't :=
-        }
-      if$
-    }
-  while$
-}
-
-FUNCTION {format.date}
-{ year duplicate$ empty$
-    { "empty year in " cite$ * warning$
-       pop$ "" }
-    'skip$
-  if$
-  month empty$
-    'skip$
-    { month
-      " " * swap$ *
-    }
-  if$
-  extra.label *
-}
-
-FUNCTION {format.btitle}
-{ title emphasize
-}
-
-FUNCTION {tie.or.space.connect}
-{ duplicate$ text.length$ #3 <
-    { "~" }
-    { " " }
-  if$
-  swap$ * *
-}
-
-FUNCTION {either.or.check}
-{ empty$
-    'pop$
-    { "can't use both " swap$ * " fields in " * cite$ * warning$ }
-  if$
-}
-
-FUNCTION {format.bvolume}
-{ volume empty$
-    { "" }
-    { "volume" volume tie.or.space.connect
-      series empty$
-        'skip$
-        { " of " * series emphasize * }
-      if$
-      "volume and number" number either.or.check
-    }
-  if$
-}
-
-FUNCTION {format.number.series}
-{ volume empty$
-    { number empty$
-        { series field.or.null }
-        { output.state mid.sentence =
-            { "number" }
-            { "Number" }
-          if$
-          number tie.or.space.connect
-          series empty$
-            { "there's a number but no series in " cite$ * warning$ }
-            { " in " * series * }
-          if$
-        }
-      if$
-    }
-    { "" }
-  if$
-}
-
-FUNCTION {format.edition}
-{ edition empty$
-    { "" }
-    { output.state mid.sentence =
-        { edition "l" change.case$ " edition" * }
-        { edition "t" change.case$ " edition" * }
-      if$
-    }
-  if$
-}
-
-INTEGERS { multiresult }
-
-FUNCTION {multi.page.check}
-{ 't :=
-  #0 'multiresult :=
-    { multiresult not
-      t empty$ not
-      and
-    }
-    { t #1 #1 substring$
-      duplicate$ "-" =
-      swap$ duplicate$ "," =
-      swap$ "+" =
-      or or
-        { #1 'multiresult := }
-        { t #2 global.max$ substring$ 't := }
-      if$
-    }
-  while$
-  multiresult
-}
-
-FUNCTION {format.pages}
-{ pages empty$
-    { "" }
-    { pages multi.page.check
-        { "pages" pages n.dashify tie.or.space.connect }
-        { "page" pages tie.or.space.connect }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.eid}
-{ eid empty$
-    { "" }
-    { "art." eid tie.or.space.connect }
-  if$
-}
-
-FUNCTION {format.vol.num.pages}
-{ volume field.or.null
-  number empty$
-    'skip$
-    { "\penalty0 (" number * ")" * *
-      volume empty$
-        { "there's a number but no volume in " cite$ * warning$ }
-        'skip$
-      if$
-    }
-  if$
-  pages empty$
-    'skip$
-    { duplicate$ empty$
-        { pop$ format.pages }
-        { ":\penalty0 " * pages n.dashify * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.vol.num.eid}
-{ volume field.or.null
-  number empty$
-    'skip$
-    { "\penalty0 (" number * ")" * *
-      volume empty$
-        { "there's a number but no volume in " cite$ * warning$ }
-        'skip$
-      if$
-    }
-  if$
-  eid empty$
-    'skip$
-    { duplicate$ empty$
-        { pop$ format.eid }
-        { ":\penalty0 " * eid * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.chapter.pages}
-{ chapter empty$
-    'format.pages
-    { type empty$
-        { "chapter" }
-        { type "l" change.case$ }
-      if$
-      chapter tie.or.space.connect
-      pages empty$
-        'skip$
-        { ", " * format.pages * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {format.in.ed.booktitle}
-{ booktitle empty$
-    { "" }
-    { editor empty$
-        { "In " booktitle emphasize * }
-        { "In " format.editors * ", " * booktitle emphasize * }
-      if$
-    }
-  if$
-}
-
-FUNCTION {empty.misc.check}
-{ author empty$ title empty$ howpublished empty$
-  month empty$ year empty$ note empty$
-  and and and and and
-  key empty$ not and
-    { "all relevant fields are empty in " cite$ * warning$ }
-    'skip$
-  if$
-}
-
-FUNCTION {format.thesis.type}
-{ type empty$
-    'skip$
-    { pop$
-      type "t" change.case$
-    }
-  if$
-}
-
-FUNCTION {format.tr.number}
-{ type empty$
-    { "Technical Report" }
-    'type
-  if$
-  number empty$
-    { "t" change.case$ }
-    { number tie.or.space.connect }
-  if$
-}
-
-FUNCTION {format.article.crossref}
-{ key empty$
-    { journal empty$
-        { "need key or journal for " cite$ * " to crossref " * crossref *
-          warning$
-          ""
-        }
-        { "In \emph{" journal * "}" * }
-      if$
-    }
-    { "In " }
-  if$
-  " \citet{" * crossref * "}" *
-}
-
-FUNCTION {format.book.crossref}
-{ volume empty$
-    { "empty volume in " cite$ * "'s crossref of " * crossref * warning$
-      "In "
-    }
-    { "Volume" volume tie.or.space.connect
-      " of " *
-    }
-  if$
-  editor empty$
-  editor field.or.null author field.or.null =
-  or
-    { key empty$
-        { series empty$
-            { "need editor, key, or series for " cite$ * " to crossref " *
-              crossref * warning$
-              "" *
-            }
-            { "\emph{" * series * "}" * }
-          if$
-        }
-        'skip$
-      if$
-    }
-    'skip$
-  if$
-  " \citet{" * crossref * "}" *
-}
-
-FUNCTION {format.incoll.inproc.crossref}
-{ editor empty$
-  editor field.or.null author field.or.null =
-  or
-    { key empty$
-        { booktitle empty$
-            { "need editor, key, or booktitle for " cite$ * " to crossref " *
-              crossref * warning$
-              ""
-            }
-            { "In \emph{" booktitle * "}" * }
-          if$
-        }
-        { "In " }
-      if$
-    }
-    { "In " }
-  if$
-  " \citet{" * crossref * "}" *
-}
-
-FUNCTION {article}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  crossref missing$
-    { journal emphasize "journal" output.check
-      eid empty$
-        { format.vol.num.pages output }
-        { format.vol.num.eid output }
-      if$
-      format.date "year" output.check
-    }
-    { format.article.crossref output.nonnull
-      eid empty$
-        { format.pages output }
-        { format.eid output }
-      if$
-    }
-  if$
-  format.issn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {book}
-{ output.bibitem
-  author empty$
-    { format.editors "author and editor" output.check
-      editor format.key output
-    }
-    { format.authors output.nonnull
-      crossref missing$
-        { "author and editor" editor either.or.check }
-        'skip$
-      if$
-    }
-  if$
-  new.block
-  format.btitle "title" output.check
-  crossref missing$
-    { format.bvolume output
-      new.block
-      format.number.series output
-      new.sentence
-      publisher "publisher" output.check
-      address output
-    }
-    { new.block
-      format.book.crossref output.nonnull
-    }
-  if$
-  format.edition output
-  format.date "year" output.check
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {booklet}
-{ output.bibitem
-  format.authors output
-  author format.key output
-  new.block
-  format.title "title" output.check
-  howpublished address new.block.checkb
-  howpublished output
-  address output
-  format.date output
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {inbook}
-{ output.bibitem
-  author empty$
-    { format.editors "author and editor" output.check
-      editor format.key output
-    }
-    { format.authors output.nonnull
-      crossref missing$
-        { "author and editor" editor either.or.check }
-        'skip$
-      if$
-    }
-  if$
-  new.block
-  format.btitle "title" output.check
-  crossref missing$
-    { format.bvolume output
-      format.chapter.pages "chapter and pages" output.check
-      new.block
-      format.number.series output
-      new.sentence
-      publisher "publisher" output.check
-      address output
-    }
-    { format.chapter.pages "chapter and pages" output.check
-      new.block
-      format.book.crossref output.nonnull
-    }
-  if$
-  format.edition output
-  format.date "year" output.check
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {incollection}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  crossref missing$
-    { format.in.ed.booktitle "booktitle" output.check
-      format.bvolume output
-      format.number.series output
-      format.chapter.pages output
-      new.sentence
-      publisher "publisher" output.check
-      address output
-      format.edition output
-      format.date "year" output.check
-    }
-    { format.incoll.inproc.crossref output.nonnull
-      format.chapter.pages output
-    }
-  if$
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {inproceedings}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  crossref missing$
-    { format.in.ed.booktitle "booktitle" output.check
-      format.bvolume output
-      format.number.series output
-      format.pages output
-      address empty$
-        { organization publisher new.sentence.checkb
-          organization output
-          publisher output
-          format.date "year" output.check
-        }
-        { address output.nonnull
-          format.date "year" output.check
-          new.sentence
-          organization output
-          publisher output
-        }
-      if$
-    }
-    { format.incoll.inproc.crossref output.nonnull
-      format.pages output
-    }
-  if$
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {conference} { inproceedings }
-
-FUNCTION {manual}
-{ output.bibitem
-  format.authors output
-  author format.key output
-  new.block
-  format.btitle "title" output.check
-  organization address new.block.checkb
-  organization output
-  address output
-  format.edition output
-  format.date output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {mastersthesis}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  "Master's thesis" format.thesis.type output.nonnull
-  school "school" output.check
-  address output
-  format.date "year" output.check
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {misc}
-{ output.bibitem
-  format.authors output
-  author format.key output
-  title howpublished new.block.checkb
-  format.title output
-  howpublished new.block.checka
-  howpublished output
-  format.date output
-  format.issn output
-  format.url output
-  new.block
-  note output
-  fin.entry
-  empty.misc.check
-}
-
-FUNCTION {phdthesis}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.btitle "title" output.check
-  new.block
-  "PhD thesis" format.thesis.type output.nonnull
-  school "school" output.check
-  address output
-  format.date "year" output.check
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {proceedings}
-{ output.bibitem
-  format.editors output
-  editor format.key output
-  new.block
-  format.btitle "title" output.check
-  format.bvolume output
-  format.number.series output
-  address output
-  format.date "year" output.check
-  new.sentence
-  organization output
-  publisher output
-  format.isbn output
-  format.doi output
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {techreport}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  format.tr.number output.nonnull
-  institution "institution" output.check
-  address output
-  format.date "year" output.check
-  format.url output
-  new.block
-  note output
-  fin.entry
-}
-
-FUNCTION {unpublished}
-{ output.bibitem
-  format.authors "author" output.check
-  author format.key output
-  new.block
-  format.title "title" output.check
-  new.block
-  note "note" output.check
-  format.date output
-  format.url output
-  fin.entry
-}
-
-FUNCTION {default.type} { misc }
-
-
-MACRO {jan} {"January"}
-
-MACRO {feb} {"February"}
-
-MACRO {mar} {"March"}
-
-MACRO {apr} {"April"}
-
-MACRO {may} {"May"}
-
-MACRO {jun} {"June"}
-
-MACRO {jul} {"July"}
-
-MACRO {aug} {"August"}
-
-MACRO {sep} {"September"}
-
-MACRO {oct} {"October"}
-
-MACRO {nov} {"November"}
-
-MACRO {dec} {"December"}
-
-
-
-MACRO {acmcs} {"ACM Computing Surveys"}
-
-MACRO {acta} {"Acta Informatica"}
-
-MACRO {cacm} {"Communications of the ACM"}
-
-MACRO {ibmjrd} {"IBM Journal of Research and Development"}
-
-MACRO {ibmsj} {"IBM Systems Journal"}
-
-MACRO {ieeese} {"IEEE Transactions on Software Engineering"}
-
-MACRO {ieeetc} {"IEEE Transactions on Computers"}
-
-MACRO {ieeetcad}
- {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"}
-
-MACRO {ipl} {"Information Processing Letters"}
-
-MACRO {jacm} {"Journal of the ACM"}
-
-MACRO {jcss} {"Journal of Computer and System Sciences"}
-
-MACRO {scp} {"Science of Computer Programming"}
-
-MACRO {sicomp} {"SIAM Journal on Computing"}
-
-MACRO {tocs} {"ACM Transactions on Computer Systems"}
-
-MACRO {tods} {"ACM Transactions on Database Systems"}
-
-MACRO {tog} {"ACM Transactions on Graphics"}
-
-MACRO {toms} {"ACM Transactions on Mathematical Software"}
-
-MACRO {toois} {"ACM Transactions on Office Information Systems"}
-
-MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"}
-
-MACRO {tcs} {"Theoretical Computer Science"}
-
-
-READ
-
-FUNCTION {sortify}
-{ purify$
-  "l" change.case$
-}
-
-INTEGERS { len }
-
-FUNCTION {chop.word}
-{ 's :=
-  'len :=
-  s #1 len substring$ =
-    { s len #1 + global.max$ substring$ }
-    's
-  if$
-}
-
-FUNCTION {format.lab.names}
-{ 's :=
-  s #1 "{vv~}{ll}" format.name$
-  s num.names$ duplicate$
-  #2 >
-    { pop$ " et~al." * }
-    { #2 <
-        'skip$
-        { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" =
-            { " et~al." * }
-            { " and " * s #2 "{vv~}{ll}" format.name$ * }
-          if$
-        }
-      if$
-    }
-  if$
-}
-
-FUNCTION {author.key.label}
-{ author empty$
-    { key empty$
-        { cite$ #1 #3 substring$ }
-        'key
-      if$
-    }
-    { author format.lab.names }
-  if$
-}
-
-FUNCTION {author.editor.key.label}
-{ author empty$
-    { editor empty$
-        { key empty$
-            { cite$ #1 #3 substring$ }
-            'key
-          if$
-        }
-        { editor format.lab.names }
-      if$
-    }
-    { author format.lab.names }
-  if$
-}
-
-FUNCTION {author.key.organization.label}
-{ author empty$
-    { key empty$
-        { organization empty$
-            { cite$ #1 #3 substring$ }
-            { "The " #4 organization chop.word #3 text.prefix$ }
-          if$
-        }
-        'key
-      if$
-    }
-    { author format.lab.names }
-  if$
-}
-
-FUNCTION {editor.key.organization.label}
-{ editor empty$
-    { key empty$
-        { organization empty$
-            { cite$ #1 #3 substring$ }
-            { "The " #4 organization chop.word #3 text.prefix$ }
-          if$
-        }
-        'key
-      if$
-    }
-    { editor format.lab.names }
-  if$
-}
-
-FUNCTION {calc.short.authors}
-{ type$ "book" =
-  type$ "inbook" =
-  or
-    'author.editor.key.label
-    { type$ "proceedings" =
-        'editor.key.organization.label
-        { type$ "manual" =
-            'author.key.organization.label
-            'author.key.label
-          if$
-        }
-      if$
-    }
-  if$
-  'short.list :=
-}
-
-FUNCTION {calc.label}
-{ calc.short.authors
-  short.list
-  "("
-  *
-  year duplicate$ empty$
-  short.list key field.or.null = or
-     { pop$ "" }
-     'skip$
-  if$
-  *
-  'label :=
-}
-
-FUNCTION {sort.format.names}
-{ 's :=
-  #1 'nameptr :=
-  ""
-  s num.names$ 'numnames :=
-  numnames 'namesleft :=
-    { namesleft #0 > }
-    {
-      s nameptr "{vv{ } }{ll{ }}{  ff{ }}{  jj{ }}" format.name$ 't :=
-      nameptr #1 >
-        {
-          "   "  *
-          namesleft #1 = t "others" = and
-            { "zzzzz" * }
-            { numnames #2 > nameptr #2 = and
-                { "zz" * year field.or.null * "   " * }
-                'skip$
-              if$
-              t sortify *
-            }
-          if$
-        }
-        { t sortify * }
-      if$
-      nameptr #1 + 'nameptr :=
-      namesleft #1 - 'namesleft :=
-    }
-  while$
-}
-
-FUNCTION {sort.format.title}
-{ 't :=
-  "A " #2
-    "An " #3
-      "The " #4 t chop.word
-    chop.word
-  chop.word
-  sortify
-  #1 global.max$ substring$
-}
-
-FUNCTION {author.sort}
-{ author empty$
-    { key empty$
-        { "to sort, need author or key in " cite$ * warning$
-          ""
-        }
-        { key sortify }
-      if$
-    }
-    { author sort.format.names }
-  if$
-}
-
-FUNCTION {author.editor.sort}
-{ author empty$
-    { editor empty$
-        { key empty$
-            { "to sort, need author, editor, or key in " cite$ * warning$
-              ""
-            }
-            { key sortify }
-          if$
-        }
-        { editor sort.format.names }
-      if$
-    }
-    { author sort.format.names }
-  if$
-}
-
-FUNCTION {author.organization.sort}
-{ author empty$
-    { organization empty$
-        { key empty$
-            { "to sort, need author, organization, or key in " cite$ * warning$
-              ""
-            }
-            { key sortify }
-          if$
-        }
-        { "The " #4 organization chop.word sortify }
-      if$
-    }
-    { author sort.format.names }
-  if$
-}
-
-FUNCTION {editor.organization.sort}
-{ editor empty$
-    { organization empty$
-        { key empty$
-            { "to sort, need editor, organization, or key in " cite$ * warning$
-              ""
-            }
-            { key sortify }
-          if$
-        }
-        { "The " #4 organization chop.word sortify }
-      if$
-    }
-    { editor sort.format.names }
-  if$
-}
-
-
-FUNCTION {presort}
-{ calc.label
-  label sortify
-  "    "
-  *
-  type$ "book" =
-  type$ "inbook" =
-  or
-    'author.editor.sort
-    { type$ "proceedings" =
-        'editor.organization.sort
-        { type$ "manual" =
-            'author.organization.sort
-            'author.sort
-          if$
-        }
-      if$
-    }
-  if$
-  "    "
-  *
-  year field.or.null sortify
-  *
-  "    "
-  *
-  cite$
-  *
-  #1 entry.max$ substring$
-  'sort.label :=
-  sort.label *
-  #1 entry.max$ substring$
-  'sort.key$ :=
-}
-
-ITERATE {presort}
-
-SORT
-
-STRINGS { longest.label last.label next.extra }
-
-INTEGERS { longest.label.width last.extra.num number.label }
-
-FUNCTION {initialize.longest.label}
-{ "" 'longest.label :=
-  #0 int.to.chr$ 'last.label :=
-  "" 'next.extra :=
-  #0 'longest.label.width :=
-  #0 'last.extra.num :=
-  #0 'number.label :=
-}
-
-FUNCTION {forward.pass}
-{ last.label label =
-    { last.extra.num #1 + 'last.extra.num :=
-      last.extra.num int.to.chr$ 'extra.label :=
-    }
-    { "a" chr.to.int$ 'last.extra.num :=
-      "" 'extra.label :=
-      label 'last.label :=
-    }
-  if$
-  number.label #1 + 'number.label :=
-}
-
-FUNCTION {reverse.pass}
-{ next.extra "b" =
-    { "a" 'extra.label := }
-    'skip$
-  if$
-  extra.label 'next.extra :=
-  extra.label
-  duplicate$ empty$
-    'skip$
-    { "{\natexlab{" swap$ * "}}" * }
-  if$
-  'extra.label :=
-  label extra.label * 'label :=
-}
-
-EXECUTE {initialize.longest.label}
-
-ITERATE {forward.pass}
-
-REVERSE {reverse.pass}
-
-FUNCTION {bib.sort.order}
-{ sort.label  'sort.key$ :=
-}
-
-ITERATE {bib.sort.order}
-
-SORT
-
-FUNCTION {begin.bib}
-{   preamble$ empty$
-    'skip$
-    { preamble$ write$ newline$ }
-  if$
-  "\begin{thebibliography}{" number.label int.to.str$ * "}" *
-  write$ newline$
-  "\providecommand{\natexlab}[1]{#1}"
-  write$ newline$
-  "\providecommand{\url}[1]{\texttt{#1}}"
-  write$ newline$
-  "\expandafter\ifx\csname urlstyle\endcsname\relax"
-  write$ newline$
-  "  \providecommand{\doi}[1]{doi: #1}\else"
-  write$ newline$
-  "  \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi"
-  write$ newline$
-}
-
-EXECUTE {begin.bib}
-
-EXECUTE {init.state.consts}
-
-ITERATE {call.type$}
-
-FUNCTION {end.bib}
-{ newline$
-  "\end{thebibliography}" write$ newline$
-}
-
-EXECUTE {end.bib}
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/hfstyle/template_content.tex b/app/scripts/latex-to-mdx/input/hfstyle/template_content.tex
deleted file mode 100644
index 9a4228494cc47bb10a198678d72685aa5af98cb9..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/hfstyle/template_content.tex
+++ /dev/null
@@ -1,100 +0,0 @@
-% A few local macros that are used by the example content.
-\newcommand{\expect}[2]{\mathds{E}_{{#1}} \left[ {#2} \right]}
-\newcommand{\myvec}[1]{\boldsymbol{#1}}
-\newcommand{\myvecsym}[1]{\boldsymbol{#1}}
-\newcommand{\vx}{\myvec{x}}
-\newcommand{\vy}{\myvec{y}}
-\newcommand{\vz}{\myvec{z}}
-\newcommand{\vtheta}{\myvecsym{\theta}}
-
-\section{Introduction}
-
-\kant[1]
-\kant[2]
-\kant[3]
-
-\section{Using Figures}
-%
-We can add figures in the usual way. Figure \ref{fig:image1}.
-\begin{figure}[t]
-	\centering
-	\includegraphics[width=\columnwidth]{kurt-cotoaga-1210012-unsplash}
-	\caption{Image. This image comes from unsplash.com, which is a great website to get 
-	free to use high quality images.}
-	\label{fig:image1}
-\end{figure}
-
-\section{Latex Environments}
-Using paragraph environment.
-\paragraph{Opening Paragraph.} Paragraph is a way to have a bolded heading, and that can also 
-enter into the pdf bookmark structure.
-
-\section{Equations}
-%
-We can write equations this way:
-\begin{align}
-\log p(\vx) & = \log \int p_\theta(\vx,\vz) p(\vz) d\vz \nonumber \\
-& = \log \expect{p(\vz)}{p_\theta(\vx,\vz)}
-\label{eq:marginalisation1}
-\end{align}
-We refer to the previous equation \eqref{eq:marginalisation1}.
-Later let's compute the gradient $\nabla_\theta \log p(\vx)$. The commands 
-\verb|\vz|, \verb|\vx|, \verb|\expect| are locally-defined macros.
-The file \texttt{defns.tex} provides a larger set of short macros for
-common constructions, but some of them clash with existing packages.
-\begin{align}
-\log p(\vx) & = \nabla_{\vtheta} \sum_{i=1}^N \log p(y | x(\vtheta)) + \mathcal{R}(x) \nonumber \\
-            & + \|\nabla_{\vtheta}\vx(\vtheta)\|^2_2 \\
-            & y \in \mathbb{R}; \vx \in \mathbb{R}^D \qquad \text{using \texttt{\textbackslash mathbb}} \\
-            & y \in \mathds{R}; \vx \in \mathds{R}^D \qquad \text{using \texttt{\textbackslash mathds}}
-\label{eq:marginalisation2}
-\end{align}
-
-\subsection{Tables}
-Use \href{https://www.tablesgenerator.com/latex_tables}{\texttt{www.tablesgenerator.com/latex\_tables}} to help make tables.
-
-\begin{table}[tb]
-	\centering
-	\caption{Sizes of datasets. Testing with a much longer caption to see how it looks over 
-	multiple lines. }
-	\begin{tabular}{lll}
-		\hline
-		Dataset  & N      & D            \\
-		\hline \hline
-		MNIST    & 60,000 & $32\times32$ \\
-		ImageNet & 1m     & $64\times64$\\
-		\hline
-	\end{tabular}
-\end{table}
-
-\subsubsection{Using lists}
-%
-Itemize lists
-\begin{itemize}
-	\item Item 1
-    \item Item 2
-	\item Item 3
-\end{itemize}
-
-\noindent Enumerate lists
-\begin{enumerate}
-	\item Item 1
-	\item Item 2
-	\item Item 3
-\end{enumerate}
-
-\section{DeepMind Brand Colours}
-The brand standard specifies a colour palette that is available using the package \texttt{dm-colors}, which is already included in this template. Colours include: \textcolor{dmblue400}{This} \textcolor{dmyellow500}{text} \textcolor{dmteal400}{is} \textcolor{dmpurple400}{rendered} \textcolor{dmred400}{using} \textcolor{dmorange400}{dmcolors}.
-
-\section{Including References and Bibliography}
-\begin{figure*}[t]
-	\centering
-	\includegraphics[width=\columnwidth]{kurt-cotoaga-1210012-unsplash}
-	\includegraphics[width=\columnwidth]{kurt-cotoaga-1210012-unsplash}
-	\caption{Image. This image comes from unsplash.com, which is a great website to get 
-		free to use high quality images.}
-	\label{fig:image2}
-\end{figure*}
-References can be formatted in two styles with the \texttt{citep}
-command \citep{silver2016mastering} and with the \texttt{citet}
-command \citet{silver2016mastering}.
diff --git a/app/scripts/latex-to-mdx/input/main.bbl b/app/scripts/latex-to-mdx/input/main.bbl
deleted file mode 100644
index d3ca7de4fe8d144fa32c845027bc0e29242a24ff..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/main.bbl
+++ /dev/null
@@ -1,49 +0,0 @@
-\begin{thebibliography}{8}
-\providecommand{\natexlab}[1]{#1}
-\providecommand{\url}[1]{\texttt{#1}}
-\expandafter\ifx\csname urlstyle\endcsname\relax
-  \providecommand{\doi}[1]{doi: #1}\else
-  \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi
-
-\bibitem[Lipman et~al.(2024)Lipman, Havasi, Holderrieth, Shaul, Le, Karrer, Chen, {Lopez-Paz}, {Ben-Hamu}, and Gat]{lipmanFlowMatchingGuide2024}
-Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T.~Q. Chen, David {Lopez-Paz}, Heli {Ben-Hamu}, and Itai Gat.
-\newblock Flow {{Matching Guide}} and {{Code}}, December 2024.
-
-\bibitem[Nakkiran et~al.(2024)Nakkiran, Bradley, Zhou, and Advani]{nakkiranStepbyStepDiffusionElementary2024}
-Preetum Nakkiran, Arwen Bradley, Hattie Zhou, and Madhu Advani.
-\newblock Step-by-{{Step Diffusion}}: {{An Elementary Tutorial}}, June 2024.
-
-\bibitem[Prince(2023)]{prince2023understanding}
-Simon~J.D. Prince.
-\newblock \emph{Understanding Deep Learning}.
-\newblock The MIT Press, 2023.
-
-\bibitem[{Shalev-Shwartz} and {Ben-David}(2014)]{shalev-shwartzUnderstandingMachineLearning2014}
-Shai {Shalev-Shwartz} and Shai {Ben-David}.
-\newblock \emph{Understanding {{Machine Learning}}: {{From Theory}} to {{Algorithms}}}.
-\newblock Cambridge University Press, 1 edition, May 2014.
-\newblock ISBN 978-1-107-05713-5 978-1-107-29801-9.
-\newblock \doi{10.1017/CBO9781107298019}.
-
-\bibitem[Siciliano and Khatib(2016)]{sicilianoSpringerHandbookRobotics2016}
-Bruno Siciliano and Oussama Khatib, editors.
-\newblock \emph{Springer {{Handbook}} of {{Robotics}}}.
-\newblock Springer {{Handbooks}}. Springer International Publishing, Cham, 2016.
-\newblock ISBN 978-3-319-32550-7 978-3-319-32552-1.
-\newblock \doi{10.1007/978-3-319-32552-1}.
-
-\bibitem[Sutton and Barto(2018)]{suttonReinforcementLearningIntroduction2018}
-Richard~S. Sutton and Andrew~G. Barto.
-\newblock \emph{Reinforcement Learning: An Introduction}.
-\newblock Adaptive Computation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts, second edition edition, 2018.
-\newblock ISBN 978-0-262-03924-6.
-
-\bibitem[Tedrake({\natexlab{a}})]{tedrakeRoboticManipulationPerception}
-Russ Tedrake.
-\newblock Robotic {{Manipulation}}. {{Perception}}, {{Planning}} and {{Control}}., {\natexlab{a}}.
-
-\bibitem[Tedrake({\natexlab{b}})]{tedrakeUnderactuatedRoboticsAlgorithms}
-Russ Tedrake.
-\newblock Underactuated {{Robotics}}. {{Algorithms}} for {{Walking}}, {{Running}}, {{Swimming}}, {{Flying}}, and {{Manipulation}}, {\natexlab{b}}.
-
-\end{thebibliography}
diff --git a/app/scripts/latex-to-mdx/input/main.bib b/app/scripts/latex-to-mdx/input/main.bib
deleted file mode 100644
index f22b4946c7f0f99c7c703c00a9977447298a042c..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/main.bib
+++ /dev/null
@@ -1,2246 +0,0 @@
-@misc{agibot-world-contributorsAgiBotWorldColosseo2025,
-  title = {{{AgiBot World Colosseo}}: {{A Large-scale Manipulation Platform}} for {{Scalable}} and {{Intelligent Embodied Systems}}},
-  shorttitle = {{{AgiBot World Colosseo}}},
-  author = {{AgiBot-World-Contributors} and Bu, Qingwen and Cai, Jisong and Chen, Li and Cui, Xiuqi and Ding, Yan and Feng, Siyuan and Gao, Shenyuan and He, Xindong and Hu, Xuan and Huang, Xu and Jiang, Shu and Jiang, Yuxin and Jing, Cheng and Li, Hongyang and Li, Jialu and Liu, Chiming and Liu, Yi and Lu, Yuxiang and Luo, Jianlan and Luo, Ping and Mu, Yao and Niu, Yuehan and Pan, Yixuan and Pang, Jiangmiao and Qiao, Yu and Ren, Guanghui and Ruan, Cheng and Shan, Jiaqi and Shen, Yongjian and Shi, Chengshi and Shi, Mingkang and Shi, Modi and Sima, Chonghao and Song, Jianheng and Wang, Huijie and Wang, Wenhao and Wei, Dafeng and Xie, Chengen and Xu, Guo and Yan, Junchi and Yang, Cunbiao and Yang, Lei and Yang, Shukai and Yao, Maoqing and Zeng, Jia and Zhang, Chi and Zhang, Qinglin and Zhao, Bin and Zhao, Chengyue and Zhao, Jiaqi and Zhu, Jianchao},
-  year = {2025},
-  month = aug,
-  number = {arXiv:2503.06669},
-  eprint = {2503.06669},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2503.06669},
-  urldate = {2025-08-27},
-  abstract = {We explore how scalable robot data can address real-world challenges for generalized robotic manipulation. Introducing AgiBot World, a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets. Accelerated by a standardized collection pipeline with human-in-the-loop verification, AgiBot World guarantees high-quality and diverse data distribution. It is extensible from grippers to dexterous hands and visuo-tactile sensors for fine-grained skill acquisition. Building on top of data, we introduce Genie Operator-1 (GO-1), a novel generalist policy that leverages latent action representations to maximize data utilization, demonstrating predictable performance scaling with increased data volume. Policies pre-trained on our dataset achieve an average performance improvement of 30\% over those trained on Open X-Embodiment, both in in-domain and out-of-distribution scenarios. GO-1 exhibits exceptional capability in real-world dexterous and long-horizon tasks, achieving over 60\% success rate on complex tasks and outperforming prior RDT approach by 32\%. By open-sourcing the dataset, tools, and models, we aim to democratize access to large-scale, high-quality robot data, advancing the pursuit of scalable and general-purpose intelligence.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/TGP4C7GA/AgiBot-World-Contributors et al. - 2025 - AgiBot World Colosseo A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Sys.pdf;/Users/fracapuano/Zotero/storage/IC7BUHWR/2503.html}
-}
-
-@article{agrawalComputationalSensorimotorLearning,
-  title = {Computational {{Sensorimotor Learning}}},
-  author = {Agrawal, Pulkit},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/KSDX9GA2/Agrawal - Computational Sensorimotor Learning.pdf}
-}
-
-@misc{akkayaSolvingRubiksCube2019,
-  title = {Solving {{Rubik}}'s {{Cube}} with a {{Robot Hand}}},
-  author = {Akkaya, Ilge and Andrychowicz, Marcin and Chociej, Maciek and Litwin, Mateusz and McGrew, Bob and Petron, Arthur and Paino, Alex and Plappert, Matthias and Powell, Glenn and Ribas, Raphael and Schneider, Jonas and Tezak, Nikolas and Tworek, Jerry and Welinder, Peter and Weng, Lilian and Yuan, Qiming and Zaremba, Wojciech and Zhang, Lei},
-  year = {2019},
-  month = oct,
-  number = {arXiv:1910.07113},
-  eprint = {1910.07113},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1910.07113},
-  urldate = {2025-08-26},
-  abstract = {We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/5HNZLG9D/OpenAI et al. - 2019 - Solving Rubik's Cube with a Robot Hand.pdf;/Users/fracapuano/Zotero/storage/WSM7BJ4I/1910.html}
-}
-
-@misc{alayracFlamingoVisualLanguage2022,
-  title = {Flamingo: A {{Visual Language Model}} for {{Few-Shot Learning}}},
-  shorttitle = {Flamingo},
-  author = {Alayrac, Jean-Baptiste and Donahue, Jeff and Luc, Pauline and Miech, Antoine and Barr, Iain and Hasson, Yana and Lenc, Karel and Mensch, Arthur and Millican, Katie and Reynolds, Malcolm and Ring, Roman and Rutherford, Eliza and Cabi, Serkan and Han, Tengda and Gong, Zhitao and Samangooei, Sina and Monteiro, Marianne and Menick, Jacob and Borgeaud, Sebastian and Brock, Andrew and Nematzadeh, Aida and Sharifzadeh, Sahand and Binkowski, Mikolaj and Barreira, Ricardo and Vinyals, Oriol and Zisserman, Andrew and Simonyan, Karen},
-  year = {2022},
-  month = nov,
-  number = {arXiv:2204.14198},
-  eprint = {2204.14198},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2204.14198},
-  urldate = {2025-08-27},
-  abstract = {Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of our models, exploring and measuring their ability to rapidly adapt to a variety of image and video tasks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer; captioning tasks, which evaluate the ability to describe a scene or an event; and close-ended tasks such as multiple-choice visual question-answering. For tasks lying anywhere on this spectrum, a single Flamingo model can achieve a new state of the art with few-shot learning, simply by prompting the model with task-specific examples. On numerous benchmarks, Flamingo outperforms models fine-tuned on thousands of times more task-specific data.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/QZ69HN5K/Alayrac et al. - 2022 - Flamingo a Visual Language Model for Few-Shot Learning.pdf;/Users/fracapuano/Zotero/storage/JMAD5HJY/2204.html}
-}
-
-@article{aldacoALOHA2Enhanced,
-  title = {{{ALOHA}} 2: {{An Enhanced Low-Cost Hardware}} for {{Bimanual Teleoperation}}},
-  author = {Aldaco, Jorge and Armstrong, Travis and Baruch, Robert and Bingham, Jeff and Chan, Sanky and Dwibedi, Debidatta and Finn, Chelsea and Florence, Pete and Goodrich, Spencer and Gramlich, Wayne and Herzog, Alexander and Hoech, Jonathan and Nguyen, Thinh and Storz, Ian and Tabanpour, Baruch and Tompson, Jonathan and Wahid, Ayzaan and Wahrburg, Ted and Xu, Sichun and Yaroshenko, Sergey and Zhao, Tony Z},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/LDEJG62Q/Aldaco et al. - ALOHA 2 An Enhanced Low-Cost Hardware for Bimanual Teleoperation.pdf}
-}
-
-@article{alizadehComprehensiveSurveySpace2024,
-  title = {A Comprehensive Survey of Space Robotic Manipulators for On-Orbit Servicing},
-  author = {Alizadeh, Mohammad and Zhu, Zheng H.},
-  year = {2024},
-  month = oct,
-  journal = {Frontiers in Robotics and AI},
-  volume = {11},
-  publisher = {Frontiers},
-  issn = {2296-9144},
-  doi = {10.3389/frobt.2024.1470950},
-  urldate = {2025-08-26},
-  abstract = {On-Orbit Servicing (OOS) robots are transforming space exploration by enabling vital maintenance and repair of spacecraft directly in space. However, achieving precise and safe manipulation in microgravity necessitates overcoming significant challenges. This survey delves into four crucial areas essential for successful OOS manipulation: object state estimation, motion planning, and feedback control. Techniques from traditional vision to advanced X-ray and neural network methods are explored for object state estimation. Strategies for fuel-optimized trajectories, docking maneuvers, and collision avoidance are examined in motion planning. The survey also explores control methods for various scenarios, including cooperative manipulation and handling uncertainties, in feedback control. Additionally, this survey examines how Machine learning techniques can further propel OOS robots towards more complex and delicate tasks in space.},
-  langid = {english},
-  keywords = {control,machine learning,motion planning,on-orbit servicing,pose estimation,robotic manipulator,space robots},
-  file = {/Users/fracapuano/Zotero/storage/VA36KZYY/Alizadeh and Zhu - 2024 - A comprehensive survey of space robotic manipulators for on-orbit servicing.pdf}
-}
-
-@misc{allalSmolLM2WhenSmol2025,
-  title = {{{SmolLM2}}: {{When Smol Goes Big}} -- {{Data-Centric Training}} of a {{Small Language Model}}},
-  shorttitle = {{{SmolLM2}}},
-  author = {Allal, Loubna Ben and Lozhkov, Anton and Bakouch, Elie and Bl{\'a}zquez, Gabriel Mart{\'i}n and Penedo, Guilherme and Tunstall, Lewis and Marafioti, Andr{\'e}s and Kydl{\'i}{\v c}ek, Hynek and Lajar{\'i}n, Agust{\'i}n Piqueres and Srivastav, Vaibhav and Lochner, Joshua and Fahlgren, Caleb and Nguyen, Xuan-Son and Fourrier, Cl{\'e}mentine and Burtenshaw, Ben and Larcher, Hugo and Zhao, Haojun and Zakka, Cyril and Morlon, Mathieu and Raffel, Colin and von Werra, Leandro and Wolf, Thomas},
-  year = {2025},
-  month = feb,
-  number = {arXiv:2502.02737},
-  eprint = {2502.02737},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2502.02737},
-  urldate = {2025-09-09},
-  abstract = {While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in resource-constrained settings. In this paper, we document the development of SmolLM2, a state-of-the-art "small" (1.7 billion parameter) language model (LM). To attain strong performance, we overtrain SmolLM2 on {\textasciitilde}11 trillion tokens of data using a multi-stage training process that mixes web text with specialized math, code, and instruction-following data. We additionally introduce new specialized datasets (FineMath, Stack-Edu, and SmolTalk) at stages where we found existing datasets to be problematically small or low-quality. To inform our design decisions, we perform both small-scale ablations as well as a manual refinement process that updates the dataset mixing rates at each stage based on the performance at the previous stage. Ultimately, we demonstrate that SmolLM2 outperforms other recent small LMs including Qwen2.5-1.5B and Llama3.2-1B. To facilitate future research on LM development as well as applications of small LMs, we release both SmolLM2 as well as all of the datasets we prepared in the course of this project.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computation and Language},
-  file = {/Users/fracapuano/Zotero/storage/I7XDMSV7/Allal et al. - 2025 - SmolLM2 When Smol Goes Big -- Data-Centric Training of a Small Language Model.pdf;/Users/fracapuano/Zotero/storage/6MLZI84T/2502.html}
-}
-
-@misc{antonovaReinforcementLearningPivoting2017,
-  title = {Reinforcement {{Learning}} for {{Pivoting Task}}},
-  author = {Antonova, Rika and Cruciani, Silvia and Smith, Christian and Kragic, Danica},
-  year = {2017},
-  month = mar,
-  number = {arXiv:1703.00472},
-  eprint = {1703.00472},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1703.00472},
-  urldate = {2025-08-25},
-  abstract = {In this work we propose an approach to learn a robust policy for solving the pivoting task. Recently, several model-free continuous control algorithms were shown to learn successful policies without prior knowledge of the dynamics of the task. However, obtaining successful policies required thousands to millions of training episodes, limiting the applicability of these approaches to real hardware. We developed a training procedure that allows us to use a simple custom simulator to learn policies robust to the mismatch of simulation vs robot. In our experiments, we demonstrate that the policy learned in the simulator is able to pivot the object to the desired target angle on the real robot. We also show generalization to an object with different inertia, shape, mass and friction properties than those used during training. This result is a step towards making model-free reinforcement learning available for solving robotics tasks via pre-training in simulators that offer only an imprecise match to the real-world dynamics.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/WRZCHVGB/Antonova et al. - 2017 - Reinforcement Learning for Pivoting Task.pdf;/Users/fracapuano/Zotero/storage/WJEJ2VGU/1703.html}
-}
-
-@article{aractingiControllingSolo12Quadruped2023,
-  title = {Controlling the {{Solo12}} Quadruped Robot with Deep Reinforcement Learning},
-  author = {Aractingi, Michel and L{\'e}ziart, Pierre-Alexandre and Flayols, Thomas and Perez, Julien and Silander, Tomi and Sou{\`e}res, Philippe},
-  year = {2023},
-  month = jul,
-  journal = {Scientific Reports},
-  volume = {13},
-  number = {1},
-  pages = {11945},
-  publisher = {Nature Publishing Group},
-  issn = {2045-2322},
-  doi = {10.1038/s41598-023-38259-7},
-  urldate = {2025-08-27},
-  abstract = {Quadruped robots require robust and general locomotion skills to exploit their mobility potential in complex and challenging environments. In this work, we present an implementation of a robust end-to-end learning-based controller on the Solo12 quadruped. Our method is based on deep reinforcement learning of joint impedance references. The resulting control policies follow a commanded velocity reference while being efficient in its energy consumption and easy to deploy. We detail the learning procedure and method for transfer on the real robot. We show elaborate experiments. Finally, we present experimental results of the learned locomotion on various grounds indoors and outdoors. These results show that the Solo12 robot is a suitable open-source platform for research combining learning and control because of the easiness in transferring and deploying learned controllers.},
-  copyright = {2023 The Author(s)},
-  langid = {english},
-  keywords = {Computer science,Information technology},
-  file = {/Users/fracapuano/Zotero/storage/84ZFT7RP/Aractingi et al. - 2023 - Controlling the Solo12 quadruped robot with deep reinforcement learning.pdf}
-}
-
-@misc{bai2025qwen25vl,
-  title = {Qwen2.5-{{VL}} Technical Report},
-  author = {Bai, Shuai and Chen, Keqin and Liu, Xuejing and Wang, Jialin and Ge, Wenbin and Song, Sibo and Dang, Kai and Wang, Peng and Wang, Shijie and Tang, Jun and Zhong, Humen and Zhu, Yuanzhi and Yang, Mingkun and Li, Zhaohai and Wan, Jianqiang and Wang, Pengfei and Ding, Wei and Fu, Zheren and Xu, Yiheng and Ye, Jiabo and Zhang, Xi and Xie, Tianbao and Cheng, Zesen and Zhang, Hang and Yang, Zhibo and Xu, Haiyang and Lin, Junyang},
-  year = {2025},
-  eprint = {2502.13923},
-  primaryclass = {cs.CV},
-  archiveprefix = {arXiv}
-}
-
-@misc{ballEfficientOnlineReinforcement2023,
-  title = {Efficient {{Online Reinforcement Learning}} with {{Offline Data}}},
-  author = {Ball, Philip J. and Smith, Laura and Kostrikov, Ilya and Levine, Sergey},
-  year = {2023},
-  month = may,
-  number = {arXiv:2302.02948},
-  eprint = {2302.02948},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2302.02948},
-  urldate = {2025-08-30},
-  abstract = {Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a \${\textbackslash}mathbf\{2.5{\textbackslash}times\}\$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead. We have released our code at https://github.com/ikostrikov/rlpd.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/MUKA5D2V/Ball et al. - 2023 - Efficient Online Reinforcement Learning with Offline Data.pdf;/Users/fracapuano/Zotero/storage/IKURHC3D/2302.html}
-}
-
-@misc{bekrisStateRobotMotion2024,
-  title = {The {{State}} of {{Robot Motion Generation}}},
-  author = {Bekris, Kostas E. and Doerr, Joe and Meng, Patrick and Tangirala, Sumanth},
-  year = {2024},
-  month = oct,
-  number = {arXiv:2410.12172},
-  eprint = {2410.12172},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2410.12172},
-  urldate = {2025-08-26},
-  abstract = {This paper reviews the large spectrum of methods for generating robot motion proposed over the 50 years of robotics research culminating in recent developments. It crosses the boundaries of methodologies, typically not surveyed together, from those that operate over explicit models to those that learn implicit ones. The paper discusses the current state-of-the-art as well as properties of varying methodologies, highlighting opportunities for integration.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/DMJJZFDZ/Bekris et al. - 2024 - The State of Robot Motion Generation.pdf;/Users/fracapuano/Zotero/storage/TL42IRAN/2410.html}
-}
-
-@article{bellemareAutonomousNavigationStratospheric2020,
-  title = {Autonomous Navigation of Stratospheric Balloons Using Reinforcement Learning},
-  author = {Bellemare, Marc G. and Candido, Salvatore and Castro, Pablo Samuel and Gong, Jun and Machado, Marlos C. and Moitra, Subhodeep and Ponda, Sameera S. and Wang, Ziyu},
-  year = {2020},
-  month = dec,
-  journal = {Nature},
-  volume = {588},
-  number = {7836},
-  pages = {77--82},
-  publisher = {Nature Publishing Group},
-  issn = {1476-4687},
-  doi = {10.1038/s41586-020-2939-8},
-  urldate = {2025-08-31},
-  abstract = {Efficiently navigating a superpressure balloon in the stratosphere1 requires the integration of a multitude of cues, such as wind speed and solar elevation, and the process is complicated by forecast errors and sparse wind measurements. Coupled with the need to make decisions in real time, these factors rule out the use of conventional control techniques2,3. Here we describe the use of reinforcement learning4,5 to create a high-performing flight controller. Our algorithm uses data augmentation6,7 and a self-correcting design to overcome the key technical challenge of reinforcement learning from imperfect data, which has proved to be a major obstacle to its application to physical systems8. We deployed our controller to station Loon superpressure balloons at multiple locations across the globe, including a 39-day controlled experiment over the Pacific Ocean. Analyses show that the controller outperforms Loon's previous algorithm and is robust to the natural diversity in stratospheric winds. These results demonstrate that reinforcement learning is an effective solution to real-world autonomous control problems in which neither conventional methods nor human intervention suffice, offering clues about what may be needed to create artificially intelligent agents that continuously interact with real, dynamic environments.},
-  copyright = {2020 The Author(s), under exclusive licence to Springer Nature Limited},
-  langid = {english},
-  keywords = {Aerospace engineering,Computer science}
-}
-
-@article{bellmanMarkovianDecisionProcess1957,
-  title = {A {{Markovian Decision Process}}},
-  author = {Bellman, Richard},
-  year = {1957},
-  journal = {Journal of Mathematics and Mechanics},
-  volume = {6},
-  number = {5},
-  eprint = {24900506},
-  eprinttype = {jstor},
-  pages = {679--684},
-  publisher = {Indiana University Mathematics Department},
-  issn = {0095-9057},
-  urldate = {2025-08-30}
-}
-
-@misc{beyerPaliGemmaVersatile3B2024,
-  title = {{{PaliGemma}}: {{A}} Versatile {{3B VLM}} for Transfer},
-  shorttitle = {{{PaliGemma}}},
-  author = {Beyer, Lucas and Steiner, Andreas and Pinto, Andr{\'e} Susano and Kolesnikov, Alexander and Wang, Xiao and Salz, Daniel and Neumann, Maxim and Alabdulmohsin, Ibrahim and Tschannen, Michael and Bugliarello, Emanuele and Unterthiner, Thomas and Keysers, Daniel and Koppula, Skanda and Liu, Fangyu and Grycner, Adam and Gritsenko, Alexey and Houlsby, Neil and Kumar, Manoj and Rong, Keran and Eisenschlos, Julian and Kabra, Rishabh and Bauer, Matthias and Bo{\v s}njak, Matko and Chen, Xi and Minderer, Matthias and Voigtlaender, Paul and Bica, Ioana and Balazevic, Ivana and Puigcerver, Joan and Papalampidi, Pinelopi and Henaff, Olivier and Xiong, Xi and Soricut, Radu and Harmsen, Jeremiah and Zhai, Xiaohua},
-  year = {2024},
-  month = oct,
-  number = {arXiv:2407.07726},
-  eprint = {2407.07726},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2407.07726},
-  urldate = {2025-09-08},
-  abstract = {PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/IPDYNWC4/Beyer et al. - 2024 - PaliGemma A versatile 3B VLM for transfer.pdf;/Users/fracapuano/Zotero/storage/R7UVD9WC/2407.html}
-}
-
-@misc{bjorckGR00TN1Open2025,
-  title = {{{GR00T N1}}: {{An Open Foundation Model}} for {{Generalist Humanoid Robots}}},
-  shorttitle = {{{GR00T N1}}},
-  author = {Bjorck, Johan and Casta{\~n}eda, Fernando and Cherniadev, Nikita and Da, Xingye and Ding, Runyu and Fan, Linxi "Jim" and Fang, Yu and Fox, Dieter and Hu, Fengyuan and Huang, Spencer and Jang, Joel and Jiang, Zhenyu and Kautz, Jan and Kundalia, Kaushil and Lao, Lawrence and Li, Zhiqi and Lin, Zongyu and Lin, Kevin and Liu, Guilin and Llontop, Edith and Magne, Loic and Mandlekar, Ajay and Narayan, Avnish and Nasiriany, Soroush and Reed, Scott and Tan, You Liang and Wang, Guanzhi and Wang, Zu and Wang, Jing and Wang, Qi and Xiang, Jiannan and Xie, Yuqi and Xu, Yinzhen and Xu, Zhenjia and Ye, Seonghyeon and Yu, Zhiding and Zhang, Ao and Zhang, Hao and Zhao, Yizhou and Zheng, Ruijie and Zhu, Yuke},
-  year = {2025},
-  month = mar,
-  number = {arXiv:2503.14734},
-  eprint = {2503.14734},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2503.14734},
-  urldate = {2025-08-26},
-  abstract = {General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy in the human world. A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapidly learn new tasks. To this end, we introduce GR00T N1, an open foundation model for humanoid robots. GR00T N1 is a Vision-Language-Action (VLA) model with a dual-system architecture. The vision-language module (System 2) interprets the environment through vision and language instructions. The subsequent diffusion transformer module (System 1) generates fluid motor actions in real time. Both modules are tightly coupled and jointly trained end-to-end. We train GR00T N1 with a heterogeneous mixture of real-robot trajectories, human videos, and synthetically generated datasets. We show that our generalist robot model GR00T N1 outperforms the state-of-the-art imitation learning baselines on standard simulation benchmarks across multiple robot embodiments. Furthermore, we deploy our model on the Fourier GR-1 humanoid robot for language-conditioned bimanual manipulation tasks, achieving strong performance with high data efficiency.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/BDNSKFA6/NVIDIA et al. - 2025 - GR00T N1 An Open Foundation Model for Generalist Humanoid Robots.pdf;/Users/fracapuano/Zotero/storage/FENU9PQR/2503.html}
-}
-
-@misc{black$p_0$VisionLanguageActionFlow2024,
-  title = {\${$\pi\_$}0\$: {{A Vision-Language-Action Flow Model}} for {{General Robot Control}}},
-  shorttitle = {\${$\pi\_$}0\$},
-  author = {Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and Jakubczak, Szymon and Jones, Tim and Ke, Liyiming and Levine, Sergey and {Li-Bell}, Adrian and Mothukuri, Mohith and Nair, Suraj and Pertsch, Karl and Shi, Lucy Xiaoyang and Tanner, James and Vuong, Quan and Walling, Anna and Wang, Haohuan and Zhilinsky, Ury},
-  year = {2024},
-  month = oct,
-  number = {arXiv:2410.24164},
-  eprint = {2410.24164},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2410.24164},
-  urldate = {2025-08-28},
-  abstract = {Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial intelligence. However, bringing robot learning to the level of generality required for effective real-world systems faces major obstacles in terms of data, generalization, and robustness. In this paper, we discuss how generalist robot policies (i.e., robot foundation models) can address these challenges, and how we can design effective generalist robot policies for complex and highly dexterous tasks. We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We then discuss how this model can be trained on a large and diverse dataset from multiple dexterous robot platforms, including single-arm robots, dual-arm robots, and mobile manipulators. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people and from a high-level VLM policy, and its ability to acquire new skills via fine-tuning. Our results cover a wide variety of tasks, such as laundry folding, table cleaning, and assembling boxes.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/GUEM37NZ/Black et al. - 2024 - $π_0$ A Vision-Language-Action Flow Model for General Robot Control.pdf;/Users/fracapuano/Zotero/storage/FHYXZWF8/2410.html}
-}
-
-@inproceedings{BLIP-2,
-  title = {{{BLIP-2}}: Bootstrapping Language-Image Pre-Training with Frozen Image Encoders and Large Language Models},
-  booktitle = {Proceedings of the 40th International Conference on Machine Learning},
-  author = {Li, Junnan and Li, Dongxu and Savarese, Silvio and Hoi, Steven},
-  year = {2023},
-  series = {{{ICML}}'23},
-  publisher = {JMLR.org},
-  address = {, Honolulu, Hawaii, USA,},
-  abstract = {The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pretraining strategy that bootstraps vision-language pre-training from off-the-shelf frozen pretrained image encoders and frozen large language models. BLIP-2 bridges the modality gap with a lightweight Querying Transformer, which is pretrained in two stages. The first stage bootstraps vision-language representation learning from a frozen image encoder. The second stage bootstraps vision-to-language generative learning from a frozen language model. BLIP-2 achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods. For example, our model outperforms Flamingo80B by 8.7\% on zero-shot VQAv2 with 54x fewer trainable parameters. We also demonstrate the model's capabilities of zero-shot image-to-text generation that can follow natural language instructions.},
-  articleno = {814}
-}
-
-@misc{brohanRT1RoboticsTransformer2023,
-  title = {{{RT-1}}: {{Robotics Transformer}} for {{Real-World Control}} at {{Scale}}},
-  shorttitle = {{{RT-1}}},
-  author = {Brohan, Anthony and Brown, Noah and Carbajal, Justice and Chebotar, Yevgen and Dabis, Joseph and Finn, Chelsea and Gopalakrishnan, Keerthana and Hausman, Karol and Herzog, Alex and Hsu, Jasmine and Ibarz, Julian and Ichter, Brian and Irpan, Alex and Jackson, Tomas and Jesmonth, Sally and Joshi, Nikhil J. and Julian, Ryan and Kalashnikov, Dmitry and Kuang, Yuheng and Leal, Isabel and Lee, Kuang-Huei and Levine, Sergey and Lu, Yao and Malla, Utsav and Manjunath, Deeksha and Mordatch, Igor and Nachum, Ofir and Parada, Carolina and Peralta, Jodilyn and Perez, Emily and Pertsch, Karl and Quiambao, Jornell and Rao, Kanishka and Ryoo, Michael and Salazar, Grecia and Sanketi, Pannag and Sayed, Kevin and Singh, Jaspiar and Sontakke, Sumedh and Stone, Austin and Tan, Clayton and Tran, Huong and Vanhoucke, Vincent and Vega, Steve and Vuong, Quan and Xia, Fei and Xiao, Ted and Xu, Peng and Xu, Sichun and Yu, Tianhe and Zitkovich, Brianna},
-  year = {2023},
-  month = aug,
-  number = {arXiv:2212.06817},
-  eprint = {2212.06817},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2212.06817},
-  urldate = {2025-09-07},
-  abstract = {By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/TTBN3M5Y/Brohan et al. - 2023 - RT-1 Robotics Transformer for Real-World Control at Scale.pdf;/Users/fracapuano/Zotero/storage/DK3D593W/2212.html}
-}
-
-@misc{brohanRT2VisionLanguageActionModels2023,
-  title = {{{RT-2}}: {{Vision-Language-Action Models Transfer Web Knowledge}} to {{Robotic Control}}},
-  shorttitle = {{{RT-2}}},
-  author = {Brohan, Anthony and Brown, Noah and Carbajal, Justice and Chebotar, Yevgen and Chen, Xi and Choromanski, Krzysztof and Ding, Tianli and Driess, Danny and Dubey, Avinava and Finn, Chelsea and Florence, Pete and Fu, Chuyuan and Arenas, Montse Gonzalez and Gopalakrishnan, Keerthana and Han, Kehang and Hausman, Karol and Herzog, Alexander and Hsu, Jasmine and Ichter, Brian and Irpan, Alex and Joshi, Nikhil and Julian, Ryan and Kalashnikov, Dmitry and Kuang, Yuheng and Leal, Isabel and Lee, Lisa and Lee, Tsang-Wei Edward and Levine, Sergey and Lu, Yao and Michalewski, Henryk and Mordatch, Igor and Pertsch, Karl and Rao, Kanishka and Reymann, Krista and Ryoo, Michael and Salazar, Grecia and Sanketi, Pannag and Sermanet, Pierre and Singh, Jaspiar and Singh, Anikait and Soricut, Radu and Tran, Huong and Vanhoucke, Vincent and Vuong, Quan and Wahid, Ayzaan and Welker, Stefan and Wohlhart, Paul and Wu, Jialin and Xia, Fei and Xiao, Ted and Xu, Peng and Xu, Sichun and Yu, Tianhe and Zitkovich, Brianna},
-  year = {2023},
-  month = jul,
-  number = {arXiv:2307.15818},
-  eprint = {2307.15818},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2307.15818},
-  urldate = {2025-09-07},
-  abstract = {We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/CZHMNYPG/Brohan et al. - 2023 - RT-2 Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.pdf;/Users/fracapuano/Zotero/storage/WN2E7AZH/2307.html}
-}
-
-@misc{brownLanguageModelsAre2020,
-  title = {Language {{Models}} Are {{Few-Shot Learners}}},
-  author = {Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and {Herbert-Voss}, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel M. and Wu, Jeffrey and Winter, Clemens and Hesse, Christopher and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},
-  year = {2020},
-  month = jul,
-  number = {arXiv:2005.14165},
-  eprint = {2005.14165},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2005.14165},
-  urldate = {2025-08-28},
-  abstract = {Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computation and Language},
-  file = {/Users/fracapuano/Zotero/storage/L6J45ZW7/Brown et al. - 2020 - Language Models are Few-Shot Learners.pdf;/Users/fracapuano/Zotero/storage/52DC5AT2/2005.html}
-}
-
-@article{burridgeSequentialCompositionDynamically1999b,
-  title = {Sequential {{Composition}} of {{Dynamically Dexterous Robot Behaviors}}},
-  author = {Burridge, R. R. and Rizzi, A. A. and Koditschek, D. E.},
-  year = {1999},
-  month = jun,
-  journal = {The International Journal of Robotics Research},
-  volume = {18},
-  number = {6},
-  pages = {534--555},
-  issn = {0278-3649, 1741-3176},
-  doi = {10.1177/02783649922066385},
-  urldate = {2025-08-26},
-  abstract = {We report on our efforts to develop a sequential robot controllercomposition technique in the context of dexterous ``batting'' maneuvers. A robot with a flat paddle is required to strike repeatedly at a thrown ball until the ball is brought to rest on the paddle at a specified location. The robot's reachable workspace is blocked by an obstacle that disconnects the free space formed when the ball and paddle remain in contact, forcing the machine to ``let go'' for a time to bring the ball to the desired state. The controller compositions we create guarantee that a ball introduced in the ``safe workspace'' remains there and is ultimately brought to the goal. We report on experimental results from an implementation of these formal composition methods, and present descriptive statistics characterizing the experiments.},
-  copyright = {https://journals.sagepub.com/page/policies/text-and-data-mining-license},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/TFZQ6EHJ/Burridge et al. - 1999 - Sequential Composition of Dynamically Dexterous Robot Behaviors.pdf}
-}
-
-@misc{cadene2024lerobot,
-  title = {{{LeRobot}}: {{State-of-the-art}} Machine Learning for Real-World Robotics in Pytorch},
-  author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Gallouedec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Choghari, Jade and Moss, Jess and Wolf, Thomas},
-  year = {2024}
-}
-
-@misc{cadeneLeRobotStateoftheartMachine,
-  title = {{{LeRobot}}: {{State-of-the-art Machine Learning}} for {{Real-World Robotics}} in {{Pytorch}}},
-  author = {Cadene, Remi}
-}
-
-@misc{cadeneLeRobotStateoftheartMachine2024,
-  title = {{{LeRobot}}: {{State-of-the-art Machine Learning}} for {{Real-World Robotics}} in {{Pytorch}}},
-  author = {Cadene, Remi and Alibert, Simon and Soare, Alexander and Galloudec, Quentin and Zouitine, Adil and Palma, Steven and Kooijmans, Pepijn and Aractingi, Michel and Shukor, Mustafa and Aubakirova, Dana and Russi, Martino and Capuano, Francesco and Pascal, Caroline and Chogari, Jade and Moss, Jess and Wolf, Thomas},
-  year = {2024}
-}
-
-@misc{caronEmergingPropertiesSelfSupervised2021,
-  title = {Emerging {{Properties}} in {{Self-Supervised Vision Transformers}}},
-  author = {Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J{\'e}gou, Herv{\'e} and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand},
-  year = {2021},
-  month = may,
-  number = {arXiv:2104.14294},
-  eprint = {2104.14294},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2104.14294},
-  urldate = {2025-09-07},
-  abstract = {In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3\% top-1 on ImageNet with a small ViT. Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels. We show the synergy between DINO and ViTs by achieving 80.1\% top-1 on ImageNet in linear evaluation with ViT-Base.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/AYIY6DTF/Caron et al. - 2021 - Emerging Properties in Self-Supervised Vision Transformers.pdf;/Users/fracapuano/Zotero/storage/EKA7ZN2P/2104.html}
-}
-
-@inproceedings{chebotar2019closing,
-  title = {Closing the Sim-to-Real Loop: {{Adapting}} Simulation Randomization with Real World Experience},
-  booktitle = {2019 International Conference on Robotics and Automation ({{ICRA}})},
-  author = {Chebotar, Yevgen and Handa, Ankur and Makoviychuk, Viktor and Macklin, Miles and Issac, Jan and Ratliff, Nathan and Fox, Dieter},
-  year = {2019},
-  pages = {8973--8979},
-  publisher = {IEEE}
-}
-
-@inproceedings{chebotarClosingSimtorealLoop2019,
-  title = {Closing the Sim-to-Real Loop: {{Adapting}} Simulation Randomization with Real World Experience},
-  shorttitle = {Closing the Sim-to-Real Loop},
-  booktitle = {2019 {{International Conference}} on {{Robotics}} and {{Automation}} ({{ICRA}})},
-  author = {Chebotar, Yevgen and Handa, Ankur and Makoviychuk, Viktor and Macklin, Miles and Issac, Jan and Ratliff, Nathan and Fox, Dieter},
-  year = {2019},
-  pages = {8973--8979},
-  publisher = {IEEE},
-  urldate = {2025-08-31}
-}
-
-@misc{chenPaLIXScalingMultilingual2023,
-  title = {{{PaLI-X}}: {{On Scaling}} up a {{Multilingual Vision}} and {{Language Model}}},
-  shorttitle = {{{PaLI-X}}},
-  author = {Chen, Xi and Djolonga, Josip and Padlewski, Piotr and Mustafa, Basil and Changpinyo, Soravit and Wu, Jialin and Ruiz, Carlos Riquelme and Goodman, Sebastian and Wang, Xiao and Tay, Yi and Shakeri, Siamak and Dehghani, Mostafa and Salz, Daniel and Lucic, Mario and Tschannen, Michael and Nagrani, Arsha and Hu, Hexiang and Joshi, Mandar and Pang, Bo and Montgomery, Ceslee and Pietrzyk, Paulina and Ritter, Marvin and Piergiovanni, A. J. and Minderer, Matthias and Pavetic, Filip and Waters, Austin and Li, Gang and Alabdulmohsin, Ibrahim and Beyer, Lucas and Amelot, Julien and Lee, Kenton and Steiner, Andreas Peter and Li, Yang and Keysers, Daniel and Arnab, Anurag and Xu, Yuanzhong and Rong, Keran and Kolesnikov, Alexander and Seyedhosseini, Mojtaba and Angelova, Anelia and Zhai, Xiaohua and Houlsby, Neil and Soricut, Radu},
-  year = {2023},
-  month = may,
-  number = {arXiv:2305.18565},
-  eprint = {2305.18565},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2305.18565},
-  urldate = {2025-09-07},
-  abstract = {We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. PaLI-X advances the state-of-the-art on most vision-and-language benchmarks considered (25+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/UES2DMFM/Chen et al. - 2023 - PaLI-X On Scaling up a Multilingual Vision and Language Model.pdf;/Users/fracapuano/Zotero/storage/LEGNNSHS/2305.html}
-}
-
-@misc{chiDiffusionPolicyVisuomotor2024,
-  title = {Diffusion {{Policy}}: {{Visuomotor Policy Learning}} via {{Action Diffusion}}},
-  shorttitle = {Diffusion {{Policy}}},
-  author = {Chi, Cheng and Xu, Zhenjia and Feng, Siyuan and Cousineau, Eric and Du, Yilun and Burchfiel, Benjamin and Tedrake, Russ and Song, Shuran},
-  year = {2024},
-  month = mar,
-  number = {arXiv:2303.04137},
-  eprint = {2303.04137},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2303.04137},
-  urldate = {2025-08-28},
-  abstract = {This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9\%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details is publicly available diffusion-policy.cs.columbia.edu},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/7XRY3GJX/Chi et al. - 2024 - Diffusion Policy Visuomotor Policy Learning via Action Diffusion.pdf;/Users/fracapuano/Zotero/storage/BBBPKKMZ/2303.html}
-}
-
-@misc{collaborationOpenXEmbodimentRobotic2025,
-  title = {Open {{X-Embodiment}}: {{Robotic Learning Datasets}} and {{RT-X Models}}},
-  shorttitle = {Open {{X-Embodiment}}},
-  author = {Collaboration, Open X.-Embodiment and O'Neill, Abby and Rehman, Abdul and Gupta, Abhinav and Maddukuri, Abhiram and Gupta, Abhishek and Padalkar, Abhishek and Lee, Abraham and Pooley, Acorn and Gupta, Agrim and Mandlekar, Ajay and Jain, Ajinkya and Tung, Albert and Bewley, Alex and Herzog, Alex and Irpan, Alex and Khazatsky, Alexander and Rai, Anant and Gupta, Anchit and Wang, Andrew and Kolobov, Andrey and Singh, Anikait and Garg, Animesh and Kembhavi, Aniruddha and Xie, Annie and Brohan, Anthony and Raffin, Antonin and Sharma, Archit and Yavary, Arefeh and Jain, Arhan and Balakrishna, Ashwin and Wahid, Ayzaan and {Burgess-Limerick}, Ben and Kim, Beomjoon and Sch{\"o}lkopf, Bernhard and Wulfe, Blake and Ichter, Brian and Lu, Cewu and Xu, Charles and Le, Charlotte and Finn, Chelsea and Wang, Chen and Xu, Chenfeng and Chi, Cheng and Huang, Chenguang and Chan, Christine and Agia, Christopher and Pan, Chuer and Fu, Chuyuan and Devin, Coline and Xu, Danfei and Morton, Daniel and Driess, Danny and Chen, Daphne and Pathak, Deepak and Shah, Dhruv and B{\"u}chler, Dieter and Jayaraman, Dinesh and Kalashnikov, Dmitry and Sadigh, Dorsa and Johns, Edward and Foster, Ethan and Liu, Fangchen and Ceola, Federico and Xia, Fei and Zhao, Feiyu and Frujeri, Felipe Vieira and Stulp, Freek and Zhou, Gaoyue and Sukhatme, Gaurav S. and Salhotra, Gautam and Yan, Ge and Feng, Gilbert and Schiavi, Giulio and Berseth, Glen and Kahn, Gregory and Yang, Guangwen and Wang, Guanzhi and Su, Hao and Fang, Hao-Shu and Shi, Haochen and Bao, Henghui and Amor, Heni Ben and Christensen, Henrik I. and Furuta, Hiroki and Bharadhwaj, Homanga and Walke, Homer and Fang, Hongjie and Ha, Huy and Mordatch, Igor and Radosavovic, Ilija and Leal, Isabel and Liang, Jacky and {Abou-Chakra}, Jad and Kim, Jaehyung and Drake, Jaimyn and Peters, Jan and Schneider, Jan and Hsu, Jasmine and Vakil, Jay and Bohg, Jeannette and Bingham, Jeffrey and Wu, Jeffrey and Gao, Jensen and Hu, Jiaheng and Wu, Jiajun and Wu, Jialin and Sun, Jiankai and Luo, Jianlan and Gu, Jiayuan and Tan, Jie and Oh, Jihoon and Wu, Jimmy and Lu, Jingpei and Yang, Jingyun and Malik, Jitendra and Silv{\'e}rio, Jo{\~a}o and Hejna, Joey and Booher, Jonathan and Tompson, Jonathan and Yang, Jonathan and Salvador, Jordi and Lim, Joseph J. and Han, Junhyek and Wang, Kaiyuan and Rao, Kanishka and Pertsch, Karl and Hausman, Karol and Go, Keegan and Gopalakrishnan, Keerthana and Goldberg, Ken and Byrne, Kendra and Oslund, Kenneth and Kawaharazuka, Kento and Black, Kevin and Lin, Kevin and Zhang, Kevin and Ehsani, Kiana and Lekkala, Kiran and Ellis, Kirsty and Rana, Krishan and Srinivasan, Krishnan and Fang, Kuan and Singh, Kunal Pratap and Zeng, Kuo-Hao and Hatch, Kyle and Hsu, Kyle and Itti, Laurent and Chen, Lawrence Yunliang and Pinto, Lerrel and {Fei-Fei}, Li and Tan, Liam and Fan, Linxi "Jim" and Ott, Lionel and Lee, Lisa and Weihs, Luca and Chen, Magnum and Lepert, Marion and Memmel, Marius and Tomizuka, Masayoshi and Itkina, Masha and Castro, Mateo Guaman and Spero, Max and Du, Maximilian and Ahn, Michael and Yip, Michael C. and Zhang, Mingtong and Ding, Mingyu and Heo, Minho and Srirama, Mohan Kumar and Sharma, Mohit and Kim, Moo Jin and Irshad, Muhammad Zubair and Kanazawa, Naoaki and Hansen, Nicklas and Heess, Nicolas and Joshi, Nikhil J. and Suenderhauf, Niko and Liu, Ning and Palo, Norman Di and Shafiullah, Nur Muhammad Mahi and Mees, Oier and Kroemer, Oliver and Bastani, Osbert and Sanketi, Pannag R. and Miller, Patrick "Tree" and Yin, Patrick and Wohlhart, Paul and Xu, Peng and Fagan, Peter David and Mitrano, Peter and Sermanet, Pierre and Abbeel, Pieter and Sundaresan, Priya and Chen, Qiuyu and Vuong, Quan and Rafailov, Rafael and Tian, Ran and Doshi, Ria and {Mart{\'i}n-Mart{\'i}n}, Roberto and Baijal, Rohan and Scalise, Rosario and Hendrix, Rose and Lin, Roy and Qian, Runjia and Zhang, Ruohan and Mendonca, Russell and Shah, Rutav and Hoque, Ryan and Julian, Ryan and Bustamante, Samuel and Kirmani, Sean and Levine, Sergey and Lin, Shan and Moore, Sherry and Bahl, Shikhar and Dass, Shivin and Sonawani, Shubham and Tulsiani, Shubham and Song, Shuran and Xu, Sichun and Haldar, Siddhant and Karamcheti, Siddharth and Adebola, Simeon and Guist, Simon and Nasiriany, Soroush and Schaal, Stefan and Welker, Stefan and Tian, Stephen and Ramamoorthy, Subramanian and Dasari, Sudeep and Belkhale, Suneel and Park, Sungjae and Nair, Suraj and Mirchandani, Suvir and Osa, Takayuki and Gupta, Tanmay and Harada, Tatsuya and Matsushima, Tatsuya and Xiao, Ted and Kollar, Thomas and Yu, Tianhe and Ding, Tianli and Davchev, Todor and Zhao, Tony Z. and Armstrong, Travis and Darrell, Trevor and Chung, Trinity and Jain, Vidhi and Kumar, Vikash and Vanhoucke, Vincent and Guizilini, Vitor and Zhan, Wei and Zhou, Wenxuan and Burgard, Wolfram and Chen, Xi and Chen, Xiangyu and Wang, Xiaolong and Zhu, Xinghao and Geng, Xinyang and Liu, Xiyuan and Liangwei, Xu and Li, Xuanlin and Pang, Yansong and Lu, Yao and Ma, Yecheng Jason and Kim, Yejin and Chebotar, Yevgen and Zhou, Yifan and Zhu, Yifeng and Wu, Yilin and Xu, Ying and Wang, Yixuan and Bisk, Yonatan and Dou, Yongqiang and Cho, Yoonyoung and Lee, Youngwoon and Cui, Yuchen and Cao, Yue and Wu, Yueh-Hua and Tang, Yujin and Zhu, Yuke and Zhang, Yunchu and Jiang, Yunfan and Li, Yunshuang and Li, Yunzhu and Iwasawa, Yusuke and Matsuo, Yutaka and Ma, Zehan and Xu, Zhuo and Cui, Zichen Jeff and Zhang, Zichen and Fu, Zipeng and Lin, Zipeng},
-  year = {2025},
-  month = may,
-  number = {arXiv:2310.08864},
-  eprint = {2310.08864},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2310.08864},
-  urldate = {2025-09-08},
-  abstract = {Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/2U73MMVN/Collaboration et al. - 2025 - Open X-Embodiment Robotic Learning Datasets and RT-X Models.pdf;/Users/fracapuano/Zotero/storage/PX7IHY32/2310.html}
-}
-
-@book{connellRobotLearning1993,
-  title = {Robot {{Learning}}},
-  editor = {Connell, Jonathan H. and Mahadevan, Sridhar},
-  year = {1993},
-  publisher = {Springer US},
-  address = {Boston, MA},
-  doi = {10.1007/978-1-4615-3184-5},
-  urldate = {2025-08-28},
-  copyright = {http://www.springer.com/tdm},
-  isbn = {978-1-4613-6396-5 978-1-4615-3184-5},
-  keywords = {algorithms,artificial intelligence,artificial life,autonom,autonomous robot,genetic algorithms,intelligence,learning,Navigation,programming,proving,robot,uncertainty}
-}
-
-@article{degraveMagneticControlTokamak2022,
-  title = {Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning},
-  author = {Degrave, Jonas and Felici, Federico and Buchli, Jonas and Neunert, Michael and Tracey, Brendan and Carpanese, Francesco and Ewalds, Timo and Hafner, Roland and Abdolmaleki, Abbas and {de las Casas}, Diego and Donner, Craig and Fritz, Leslie and Galperti, Cristian and Huber, Andrea and Keeling, James and Tsimpoukelli, Maria and Kay, Jackie and Merle, Antoine and Moret, Jean-Marc and Noury, Seb and Pesamosca, Federico and Pfau, David and Sauter, Olivier and Sommariva, Cristian and Coda, Stefano and Duval, Basil and Fasoli, Ambrogio and Kohli, Pushmeet and Kavukcuoglu, Koray and Hassabis, Demis and Riedmiller, Martin},
-  year = {2022},
-  month = feb,
-  journal = {Nature},
-  volume = {602},
-  number = {7897},
-  pages = {414--419},
-  publisher = {Nature Publishing Group},
-  issn = {1476-4687},
-  doi = {10.1038/s41586-021-04301-9},
-  urldate = {2025-08-31},
-  abstract = {Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a promising path towards sustainable energy. A core challenge is to shape and maintain a high-temperature plasma within the tokamak vessel. This requires high-dimensional, high-frequency, closed-loop control using magnetic actuator coils, further complicated by the diverse requirements across a wide range of plasma configurations. In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. This architecture meets control objectives specified at a high level, at the same time satisfying physical and operational constraints. This approach has unprecedented flexibility and generality in problem specification and yields a notable reduction in design effort to produce new plasma configurations. We successfully produce and control a diverse set of plasma configurations on the Tokamak {\`a} Configuration Variable1,2, including elongated, conventional shapes, as well as advanced configurations, such as negative triangularity and `snowflake' configurations. Our approach achieves accurate tracking of the location, current and shape for these configurations. We also demonstrate sustained `droplets' on TCV, in which two separate plasmas are maintained simultaneously within the vessel. This represents a notable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain, and is one of the most challenging real-world systems to which reinforcement learning has been applied.},
-  copyright = {2022 The Author(s)},
-  langid = {english},
-  keywords = {Computer science,Magnetically confined plasmas,Nuclear fusion and fission},
-  file = {/Users/fracapuano/Zotero/storage/EZ4EAU84/Degrave et al. - 2022 - Magnetic control of tokamak plasmas through deep reinforcement learning.pdf}
-}
-
-@misc{devlinBERTPretrainingDeep2019,
-  title = {{{BERT}}: {{Pre-training}} of {{Deep Bidirectional Transformers}} for {{Language Understanding}}},
-  shorttitle = {{{BERT}}},
-  author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
-  year = {2019},
-  month = may,
-  number = {arXiv:1810.04805},
-  eprint = {1810.04805},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1810.04805},
-  urldate = {2025-09-08},
-  abstract = {We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5\% (7.7\% point absolute improvement), MultiNLI accuracy to 86.7\% (4.6\% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computation and Language},
-  file = {/Users/fracapuano/Zotero/storage/AJ3SRLHF/Devlin et al. - 2019 - BERT Pre-training of Deep Bidirectional Transformers for Language Understanding.pdf;/Users/fracapuano/Zotero/storage/LNIKJNIW/1810.html}
-}
-
-@misc{driessKnowledgeInsulatingVisionLanguageAction2025,
-  title = {Knowledge {{Insulating Vision-Language-Action Models}}: {{Train Fast}}, {{Run Fast}}, {{Generalize Better}}},
-  shorttitle = {Knowledge {{Insulating Vision-Language-Action Models}}},
-  author = {Driess, Danny and Springenberg, Jost Tobias and Ichter, Brian and Yu, Lili and {Li-Bell}, Adrian and Pertsch, Karl and Ren, Allen Z. and Walke, Homer and Vuong, Quan and Shi, Lucy Xiaoyang and Levine, Sergey},
-  year = {2025},
-  month = may,
-  number = {arXiv:2505.23705},
-  eprint = {2505.23705},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2505.23705},
-  urldate = {2025-09-09},
-  abstract = {Vision-language-action (VLA) models provide a powerful approach to training control policies for physical systems, such as robots, by combining end-to-end learning with transfer of semantic knowledge from web-scale vision-language model (VLM) training. However, the constraints of real-time control are often at odds with the design of VLMs: the most powerful VLMs have tens or hundreds of billions of parameters, presenting an obstacle to real-time inference, and operate on discrete tokens rather than the continuous-valued outputs that are required for controlling robots. To address this challenge, recent VLA models have used specialized modules for efficient continuous control, such as action experts or continuous output heads, which typically require adding new untrained parameters to the pretrained VLM backbone. While these modules improve real-time and control capabilities, it remains an open question whether they preserve or degrade the semantic knowledge contained in the pretrained VLM, and what effect they have on the VLA training dynamics. In this paper, we study this question in the context of VLAs that include a continuous diffusion or flow matching action expert, showing that naively including such experts significantly harms both training speed and knowledge transfer. We provide an extensive analysis of various design choices, their impact on performance and knowledge transfer, and propose a technique for insulating the VLM backbone during VLA training that mitigates this issue. Videos are available at https://pi.website/research/knowledge\_insulation.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/QHTS9JIC/Driess et al. - 2025 - Knowledge Insulating Vision-Language-Action Models Train Fast, Run Fast, Generalize Better.pdf;/Users/fracapuano/Zotero/storage/3U9FCXRB/2505.html}
-}
-
-@misc{driessPaLMEEmbodiedMultimodal2023,
-  title = {{{PaLM-E}}: {{An Embodied Multimodal Language Model}}},
-  shorttitle = {{{PaLM-E}}},
-  author = {Driess, Danny and Xia, Fei and Sajjadi, Mehdi S. M. and Lynch, Corey and Chowdhery, Aakanksha and Ichter, Brian and Wahid, Ayzaan and Tompson, Jonathan and Vuong, Quan and Yu, Tianhe and Huang, Wenlong and Chebotar, Yevgen and Sermanet, Pierre and Duckworth, Daniel and Levine, Sergey and Vanhoucke, Vincent and Hausman, Karol and Toussaint, Marc and Greff, Klaus and Zeng, Andy and Mordatch, Igor and Florence, Pete},
-  year = {2023},
-  month = mar,
-  number = {arXiv:2303.03378},
-  eprint = {2303.03378},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2303.03378},
-  urldate = {2025-09-07},
-  abstract = {Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Input to our embodied language model are multi-modal sentences that interleave visual, continuous state estimation, and textual input encodings. We train these encodings end-to-end, in conjunction with a pre-trained large language model, for multiple embodied tasks including sequential robotic manipulation planning, visual question answering, and captioning. Our evaluations show that PaLM-E, a single large embodied multimodal model, can address a variety of embodied reasoning tasks, from a variety of observation modalities, on multiple embodiments, and further, exhibits positive transfer: the model benefits from diverse joint training across internet-scale language, vision, and visual-language domains. Our largest model, PaLM-E-562B with 562B parameters, in addition to being trained on robotics tasks, is a visual-language generalist with state-of-the-art performance on OK-VQA, and retains generalist language capabilities with increasing scale.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/PQSPI784/Driess et al. - 2023 - PaLM-E An Embodied Multimodal Language Model.pdf;/Users/fracapuano/Zotero/storage/K3PJVSGB/2303.html}
-}
-
-@misc{esserScalingRectifiedFlow2024,
-  title = {Scaling {{Rectified Flow Transformers}} for {{High-Resolution Image Synthesis}}},
-  author = {Esser, Patrick and Kulal, Sumith and Blattmann, Andreas and Entezari, Rahim and M{\"u}ller, Jonas and Saini, Harry and Levi, Yam and Lorenz, Dominik and Sauer, Axel and Boesel, Frederic and Podell, Dustin and Dockhorn, Tim and English, Zion and Lacey, Kyle and Goodwin, Alex and Marek, Yannik and Rombach, Robin},
-  year = {2024},
-  month = mar,
-  number = {arXiv:2403.03206},
-  eprint = {2403.03206},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2403.03206},
-  urldate = {2025-09-07},
-  abstract = {Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. Through a large-scale study, we demonstrate the superior performance of this approach compared to established diffusion formulations for high-resolution text-to-image synthesis. Additionally, we present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens, improving text comprehension, typography, and human preference ratings. We demonstrate that this architecture follows predictable scaling trends and correlates lower validation loss to improved text-to-image synthesis as measured by various metrics and human evaluations. Our largest models outperform state-of-the-art models, and we will make our experimental data, code, and model weights publicly available.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/23TGK9JM/Esser et al. - 2024 - Scaling Rectified Flow Transformers for High-Resolution Image Synthesis.pdf;/Users/fracapuano/Zotero/storage/W2CRYPZY/2403.html}
-}
-
-@misc{fedusReviewSparseExpert2022,
-  title = {A {{Review}} of {{Sparse Expert Models}} in {{Deep Learning}}},
-  author = {Fedus, William and Dean, Jeff and Zoph, Barret},
-  year = {2022},
-  month = sep,
-  number = {arXiv:2209.01667},
-  eprint = {2209.01667},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2209.01667},
-  urldate = {2025-09-08},
-  abstract = {Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning. This class of architecture encompasses Mixture-of-Experts, Switch Transformers, Routing Networks, BASE layers, and others, all with the unifying idea that each example is acted on by a subset of the parameters. By doing so, the degree of sparsity decouples the parameter count from the compute per example allowing for extremely large, but efficient models. The resulting models have demonstrated significant improvements across diverse domains such as natural language processing, computer vision, and speech recognition. We review the concept of sparse expert models, provide a basic description of the common algorithms, contextualize the advances in the deep learning era, and conclude by highlighting areas for future work.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/MZXG2WMJ/Fedus et al. - 2022 - A Review of Sparse Expert Models in Deep Learning.pdf;/Users/fracapuano/Zotero/storage/GLZINJYC/2209.html}
-}
-
-@misc{finiMultimodalAutoregressivePretraining2024,
-  title = {Multimodal {{Autoregressive Pre-training}} of {{Large Vision Encoders}}},
-  author = {Fini, Enrico and Shukor, Mustafa and Li, Xiujun and Dufter, Philipp and Klein, Michal and Haldimann, David and Aitharaju, Sai and da Costa, Victor Guilherme Turrisi and B{\'e}thune, Louis and Gan, Zhe and Toshev, Alexander T. and Eichner, Marcin and Nabi, Moin and Yang, Yinfei and Susskind, Joshua M. and {El-Nouby}, Alaaeldin},
-  year = {2024},
-  month = nov,
-  number = {arXiv:2411.14402},
-  eprint = {2411.14402},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2411.14402},
-  urldate = {2025-09-09},
-  abstract = {We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we present AIMV2, a family of generalist vision encoders characterized by a straightforward pre-training process, scalability, and remarkable performance across a range of downstream tasks. This is achieved by pairing the vision encoder with a multimodal decoder that autoregressively generates raw image patches and text tokens. Our encoders excel not only in multimodal evaluations but also in vision benchmarks such as localization, grounding, and classification. Notably, our AIMV2-3B encoder achieves 89.5\% accuracy on ImageNet-1k with a frozen trunk. Furthermore, AIMV2 consistently outperforms state-of-the-art contrastive models (e.g., CLIP, SigLIP) in multimodal image understanding across diverse settings.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/ULTX55I6/Fini et al. - 2024 - Multimodal Autoregressive Pre-training of Large Vision Encoders.pdf;/Users/fracapuano/Zotero/storage/SUG2W6A9/2411.html}
-}
-
-@inproceedings{florenceImplicitBehavioralCloning2022,
-  title = {Implicit {{Behavioral Cloning}}},
-  booktitle = {Proceedings of the 5th {{Conference}} on {{Robot Learning}}},
-  author = {Florence, Pete and Lynch, Corey and Zeng, Andy and Ramirez, Oscar A. and Wahid, Ayzaan and Downs, Laura and Wong, Adrian and Lee, Johnny and Mordatch, Igor and Tompson, Jonathan},
-  year = {2022},
-  month = jan,
-  pages = {158--168},
-  publisher = {PMLR},
-  issn = {2640-3498},
-  urldate = {2025-09-01},
-  abstract = {We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models.  We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavior-cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavior-cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/Q8I5E862/Florence et al. - 2022 - Implicit Behavioral Cloning.pdf}
-}
-
-@misc{FROMAGe,
-  title = {Grounding Language Models to Images for Multimodal Inputs and Outputs},
-  author = {Koh, Jing Yu and Salakhutdinov, Ruslan and Fried, Daniel},
-  year = {2023}
-}
-
-@article{fujitaDevelopmentRobotsNuclear2020,
-  title = {Development of {{Robots}} for {{Nuclear Power Plants}} and {{Their Application}} to {{New Fields}}},
-  author = {Fujita, Jun and Soda, Daisuke and Murata, Chotaro and Tsuhari, Hiroyuki},
-  year = {2020},
-  volume = {57},
-  number = {4},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/K349QTEG/Fujita et al. - 2020 - Development of Robots for Nuclear Power Plants and Their Application to New Fields.pdf}
-}
-
-@misc{grattafioriLlama3Herd2024,
-  title = {The {{Llama}} 3 {{Herd}} of {{Models}}},
-  author = {Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and {Al-Dahle}, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and Yang, Amy and Fan, Angela and Goyal, Anirudh and Hartshorn, Anthony and Yang, Aobo and Mitra, Archi and Sravankumar, Archie and Korenev, Artem and Hinsvark, Arthur and Rao, Arun and Zhang, Aston and Rodriguez, Aurelien and Gregerson, Austen and Spataru, Ava and Roziere, Baptiste and Biron, Bethany and Tang, Binh and Chern, Bobbie and Caucheteux, Charlotte and Nayak, Chaya and Bi, Chloe and Marra, Chris and McConnell, Chris and Keller, Christian and Touret, Christophe and Wu, Chunyang and Wong, Corinne and Ferrer, Cristian Canton and Nikolaidis, Cyrus and Allonsius, Damien and Song, Daniel and Pintz, Danielle and Livshits, Danny and Wyatt, Danny and Esiobu, David and Choudhary, Dhruv and Mahajan, Dhruv and {Garcia-Olano}, Diego and Perino, Diego and Hupkes, Dieuwke and Lakomkin, Egor and AlBadawy, Ehab and Lobanova, Elina and Dinan, Emily and Smith, Eric Michael and Radenovic, Filip and Guzm{\'a}n, Francisco and Zhang, Frank and Synnaeve, Gabriel and Lee, Gabrielle and Anderson, Georgia Lewis and Thattai, Govind and Nail, Graeme and Mialon, Gregoire and Pang, Guan and Cucurell, Guillem and Nguyen, Hailey and Korevaar, Hannah and Xu, Hu and Touvron, Hugo and Zarov, Iliyan and Ibarra, Imanol Arrieta and Kloumann, Isabel and Misra, Ishan and Evtimov, Ivan and Zhang, Jack and Copet, Jade and Lee, Jaewon and Geffert, Jan and Vranes, Jana and Park, Jason and Mahadeokar, Jay and Shah, Jeet and van der Linde, Jelmer and Billock, Jennifer and Hong, Jenny and Lee, Jenya and Fu, Jeremy and Chi, Jianfeng and Huang, Jianyu and Liu, Jiawen and Wang, Jie and Yu, Jiecao and Bitton, Joanna and Spisak, Joe and Park, Jongsoo and Rocca, Joseph and Johnstun, Joshua and Saxe, Joshua and Jia, Junteng and Alwala, Kalyan Vasuden and Prasad, Karthik and Upasani, Kartikeya and Plawiak, Kate and Li, Ke and Heafield, Kenneth and Stone, Kevin and {El-Arini}, Khalid and Iyer, Krithika and Malik, Kshitiz and Chiu, Kuenley and Bhalla, Kunal and Lakhotia, Kushal and {Rantala-Yeary}, Lauren and van der Maaten, Laurens and Chen, Lawrence and Tan, Liang and Jenkins, Liz and Martin, Louis and Madaan, Lovish and Malo, Lubo and Blecher, Lukas and Landzaat, Lukas and de Oliveira, Luke and Muzzi, Madeline and Pasupuleti, Mahesh and Singh, Mannat and Paluri, Manohar and Kardas, Marcin and Tsimpoukelli, Maria and Oldham, Mathew and Rita, Mathieu and Pavlova, Maya and Kambadur, Melanie and Lewis, Mike and Si, Min and Singh, Mitesh Kumar and Hassan, Mona and Goyal, Naman and Torabi, Narjes and Bashlykov, Nikolay and Bogoychev, Nikolay and Chatterji, Niladri and Zhang, Ning and Duchenne, Olivier and {\c C}elebi, Onur and Alrassy, Patrick and Zhang, Pengchuan and Li, Pengwei and Vasic, Petar and Weng, Peter and Bhargava, Prajjwal and Dubal, Pratik and Krishnan, Praveen and Koura, Punit Singh and Xu, Puxin and He, Qing and Dong, Qingxiao and Srinivasan, Ragavan and Ganapathy, Raj and Calderer, Ramon and Cabral, Ricardo Silveira and Stojnic, Robert and Raileanu, Roberta and Maheswari, Rohan and Girdhar, Rohit and Patel, Rohit and Sauvestre, Romain and Polidoro, Ronnie and Sumbaly, Roshan and Taylor, Ross and Silva, Ruan and Hou, Rui and Wang, Rui and Hosseini, Saghar and Chennabasappa, Sahana and Singh, Sanjay and Bell, Sean and Kim, Seohyun Sonia and Edunov, Sergey and Nie, Shaoliang and Narang, Sharan and Raparthy, Sharath and Shen, Sheng and Wan, Shengye and Bhosale, Shruti and Zhang, Shun and Vandenhende, Simon and Batra, Soumya and Whitman, Spencer and Sootla, Sten and Collot, Stephane and Gururangan, Suchin and Borodinsky, Sydney and Herman, Tamar and Fowler, Tara and Sheasha, Tarek and Georgiou, Thomas and Scialom, Thomas and Speckbacher, Tobias and Mihaylov, Todor and Xiao, Tong and Karn, Ujjwal and Goswami, Vedanuj and Gupta, Vibhor and Ramanathan, Vignesh and Kerkez, Viktor and Gonguet, Vincent and Do, Virginie and Vogeti, Vish and Albiero, V{\'i}tor and Petrovic, Vladan and Chu, Weiwei and Xiong, Wenhan and Fu, Wenyin and Meers, Whitney and Martinet, Xavier and Wang, Xiaodong and Wang, Xiaofang and Tan, Xiaoqing Ellen and Xia, Xide and Xie, Xinfeng and Jia, Xuchao and Wang, Xuewei and Goldschlag, Yaelle and Gaur, Yashesh and Babaei, Yasmine and Wen, Yi and Song, Yiwen and Zhang, Yuchen and Li, Yue and Mao, Yuning and Coudert, Zacharie Delpierre and Yan, Zheng and Chen, Zhengxing and Papakipos, Zoe and Singh, Aaditya and Srivastava, Aayushi and Jain, Abha and Kelsey, Adam and Shajnfeld, Adam and Gangidi, Adithya and Victoria, Adolfo and Goldstand, Ahuva and Menon, Ajay and Sharma, Ajay and Boesenberg, Alex and Baevski, Alexei and Feinstein, Allie and Kallet, Amanda and Sangani, Amit and Teo, Amos and Yunus, Anam and Lupu, Andrei and Alvarado, Andres and Caples, Andrew and Gu, Andrew and Ho, Andrew and Poulton, Andrew and Ryan, Andrew and Ramchandani, Ankit and Dong, Annie and Franco, Annie and Goyal, Anuj and Saraf, Aparajita and Chowdhury, Arkabandhu and Gabriel, Ashley and Bharambe, Ashwin and Eisenman, Assaf and Yazdan, Azadeh and James, Beau and Maurer, Ben and Leonhardi, Benjamin and Huang, Bernie and Loyd, Beth and Paola, Beto De and Paranjape, Bhargavi and Liu, Bing and Wu, Bo and Ni, Boyu and Hancock, Braden and Wasti, Bram and Spence, Brandon and Stojkovic, Brani and Gamido, Brian and Montalvo, Britt and Parker, Carl and Burton, Carly and Mejia, Catalina and Liu, Ce and Wang, Changhan and Kim, Changkyu and Zhou, Chao and Hu, Chester and Chu, Ching-Hsiang and Cai, Chris and Tindal, Chris and Feichtenhofer, Christoph and Gao, Cynthia and Civin, Damon and Beaty, Dana and Kreymer, Daniel and Li, Daniel and Adkins, David and Xu, David and Testuggine, Davide and David, Delia and Parikh, Devi and Liskovich, Diana and Foss, Didem and Wang, Dingkang and Le, Duc and Holland, Dustin and Dowling, Edward and Jamil, Eissa and Montgomery, Elaine and Presani, Eleonora and Hahn, Emily and Wood, Emily and Le, Eric-Tuan and Brinkman, Erik and Arcaute, Esteban and Dunbar, Evan and Smothers, Evan and Sun, Fei and Kreuk, Felix and Tian, Feng and Kokkinos, Filippos and Ozgenel, Firat and Caggioni, Francesco and Kanayet, Frank and Seide, Frank and Florez, Gabriela Medina and Schwarz, Gabriella and Badeer, Gada and Swee, Georgia and Halpern, Gil and Herman, Grant and Sizov, Grigory and Guangyi and Zhang and Lakshminarayanan, Guna and Inan, Hakan and Shojanazeri, Hamid and Zou, Han and Wang, Hannah and Zha, Hanwen and Habeeb, Haroun and Rudolph, Harrison and Suk, Helen and Aspegren, Henry and Goldman, Hunter and Zhan, Hongyuan and Damlaj, Ibrahim and Molybog, Igor and Tufanov, Igor and Leontiadis, Ilias and Veliche, Irina-Elena and Gat, Itai and Weissman, Jake and Geboski, James and Kohli, James and Lam, Janice and Asher, Japhet and Gaya, Jean-Baptiste and Marcus, Jeff and Tang, Jeff and Chan, Jennifer and Zhen, Jenny and Reizenstein, Jeremy and Teboul, Jeremy and Zhong, Jessica and Jin, Jian and Yang, Jingyi and Cummings, Joe and Carvill, Jon and Shepard, Jon and McPhie, Jonathan and Torres, Jonathan and Ginsburg, Josh and Wang, Junjie and Wu, Kai and U, Kam Hou and Saxena, Karan and Khandelwal, Kartikay and Zand, Katayoun and Matosich, Kathy and Veeraraghavan, Kaushik and Michelena, Kelly and Li, Keqian and Jagadeesh, Kiran and Huang, Kun and Chawla, Kunal and Huang, Kyle and Chen, Lailin and Garg, Lakshya and A, Lavender and Silva, Leandro and Bell, Lee and Zhang, Lei and Guo, Liangpeng and Yu, Licheng and Moshkovich, Liron and Wehrstedt, Luca and Khabsa, Madian and Avalani, Manav and Bhatt, Manish and Mankus, Martynas and Hasson, Matan and Lennie, Matthew and Reso, Matthias and Groshev, Maxim and Naumov, Maxim and Lathi, Maya and Keneally, Meghan and Liu, Miao and Seltzer, Michael L. and Valko, Michal and Restrepo, Michelle and Patel, Mihir and Vyatskov, Mik and Samvelyan, Mikayel and Clark, Mike and Macey, Mike and Wang, Mike and Hermoso, Miquel Jubert and Metanat, Mo and Rastegari, Mohammad and Bansal, Munish and Santhanam, Nandhini and Parks, Natascha and White, Natasha and Bawa, Navyata and Singhal, Nayan and Egebo, Nick and Usunier, Nicolas and Mehta, Nikhil and Laptev, Nikolay Pavlovich and Dong, Ning and Cheng, Norman and Chernoguz, Oleg and Hart, Olivia and Salpekar, Omkar and Kalinli, Ozlem and Kent, Parkin and Parekh, Parth and Saab, Paul and Balaji, Pavan and Rittner, Pedro and Bontrager, Philip and Roux, Pierre and Dollar, Piotr and Zvyagina, Polina and Ratanchandani, Prashant and Yuvraj, Pritish and Liang, Qian and Alao, Rachad and Rodriguez, Rachel and Ayub, Rafi and Murthy, Raghotham and Nayani, Raghu and Mitra, Rahul and Parthasarathy, Rangaprabhu and Li, Raymond and Hogan, Rebekkah and Battey, Robin and Wang, Rocky and Howes, Russ and Rinott, Ruty and Mehta, Sachin and Siby, Sachin and Bondu, Sai Jayesh and Datta, Samyak and Chugh, Sara and Hunt, Sara and Dhillon, Sargun and Sidorov, Sasha and Pan, Satadru and Mahajan, Saurabh and Verma, Saurabh and Yamamoto, Seiji and Ramaswamy, Sharadh and Lindsay, Shaun and Lindsay, Shaun and Feng, Sheng and Lin, Shenghao and Zha, Shengxin Cindy and Patil, Shishir and Shankar, Shiva and Zhang, Shuqiang and Zhang, Shuqiang and Wang, Sinong and Agarwal, Sneha and Sajuyigbe, Soji and Chintala, Soumith and Max, Stephanie and Chen, Stephen and Kehoe, Steve and Satterfield, Steve and Govindaprasad, Sudarshan and Gupta, Sumit and Deng, Summer and Cho, Sungmin and Virk, Sunny and Subramanian, Suraj and Choudhury, Sy and Goldman, Sydney and Remez, Tal and Glaser, Tamar and Best, Tamara and Koehler, Thilo and Robinson, Thomas and Li, Tianhe and Zhang, Tianjun and Matthews, Tim and Chou, Timothy and Shaked, Tzook and Vontimitta, Varun and Ajayi, Victoria and Montanez, Victoria and Mohan, Vijai and Kumar, Vinay Satish and Mangla, Vishal and Ionescu, Vlad and Poenaru, Vlad and Mihailescu, Vlad Tiberiu and Ivanov, Vladimir and Li, Wei and Wang, Wenchen and Jiang, Wenwen and Bouaziz, Wes and Constable, Will and Tang, Xiaocheng and Wu, Xiaojian and Wang, Xiaolan and Wu, Xilun and Gao, Xinbo and Kleinman, Yaniv and Chen, Yanjun and Hu, Ye and Jia, Ye and Qi, Ye and Li, Yenda and Zhang, Yilin and Zhang, Ying and Adi, Yossi and Nam, Youngjin and Yu and Wang and Zhao, Yu and Hao, Yuchen and Qian, Yundi and Li, Yunlu and He, Yuzi and Rait, Zach and DeVito, Zachary and Rosnbrick, Zef and Wen, Zhaoduo and Yang, Zhenyu and Zhao, Zhiwei and Ma, Zhiyu},
-  year = {2024},
-  month = nov,
-  number = {arXiv:2407.21783},
-  eprint = {2407.21783},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2407.21783},
-  urldate = {2025-09-09},
-  abstract = {Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/88PJ48EN/Grattafiori et al. - 2024 - The Llama 3 Herd of Models.pdf;/Users/fracapuano/Zotero/storage/2LLAWX8L/2407.html}
-}
-
-@inproceedings{griffinWalkingStabilizationUsing2017,
-  title = {Walking {{Stabilization Using Step Timing}} and {{Location Adjustment}} on the {{Humanoid Robot}}, {{Atlas}}},
-  booktitle = {2017 {{IEEE}}/{{RSJ International Conference}} on {{Intelligent Robots}} and {{Systems}} ({{IROS}})},
-  author = {Griffin, Robert J. and Wiedebach, Georg and Bertrand, Sylvain and Leonessa, Alexander and Pratt, Jerry},
-  year = {2017},
-  month = sep,
-  eprint = {1703.00477},
-  primaryclass = {cs},
-  pages = {667--673},
-  doi = {10.1109/IROS.2017.8202223},
-  urldate = {2025-08-26},
-  abstract = {While humans are highly capable of recovering from external disturbances and uncertainties that result in large tracking errors, humanoid robots have yet to reliably mimic this level of robustness. Essential to this is the ability to combine traditional "ankle strategy" balancing with step timing and location adjustment techniques. In doing so, the robot is able to step quickly to the necessary location to continue walking. In this work, we present both a new swing speed up algorithm to adjust the step timing, allowing the robot to set the foot down more quickly to recover from errors in the direction of the current capture point dynamics, and a new algorithm to adjust the desired footstep, expanding the base of support to utilize the center of pressure (CoP)-based ankle strategy for balance. We then utilize the desired centroidal moment pivot (CMP) to calculate the momentum rate of change for our inverse-dynamics based whole-body controller. We present simulation and experimental results using this work, and discuss performance limitations and potential improvements.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/SSNAZ6U4/Griffin et al. - 2017 - Walking Stabilization Using Step Timing and Location Adjustment on the Humanoid Robot, Atlas.pdf;/Users/fracapuano/Zotero/storage/VP885PA9/1703.html}
-}
-
-@misc{haarnojaReinforcementLearningDeep2017,
-  title = {Reinforcement {{Learning}} with {{Deep Energy-Based Policies}}},
-  author = {Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey},
-  year = {2017},
-  month = jul,
-  number = {arXiv:1702.08165},
-  eprint = {1702.08165},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1702.08165},
-  urldate = {2025-08-31},
-  abstract = {We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/PXCR4TCT/Haarnoja et al. - 2017 - Reinforcement Learning with Deep Energy-Based Policies.pdf;/Users/fracapuano/Zotero/storage/VUXXX9B7/1702.html}
-}
-
-@misc{haarnojaReinforcementLearningDeep2017a,
-  title = {Reinforcement {{Learning}} with {{Deep Energy-Based Policies}}},
-  author = {Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey},
-  year = {2017},
-  month = jul,
-  number = {arXiv:1702.08165},
-  eprint = {1702.08165},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1702.08165},
-  urldate = {2025-08-31},
-  abstract = {We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/T84UBYDJ/Haarnoja et al. - 2017 - Reinforcement Learning with Deep Energy-Based Policies.pdf;/Users/fracapuano/Zotero/storage/53SJ2ED8/1702.html}
-}
-
-@inproceedings{haarnojaReinforcementLearningDeep2017b,
-  title = {Reinforcement {{Learning}} with {{Deep Energy-Based Policies}}},
-  booktitle = {Proceedings of the 34th {{International Conference}} on {{Machine Learning}}},
-  author = {Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey},
-  year = {2017},
-  month = jul,
-  pages = {1352--1361},
-  publisher = {PMLR},
-  issn = {2640-3498},
-  urldate = {2025-08-31},
-  abstract = {We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/C59BJ4GU/Haarnoja et al. - 2017 - Reinforcement Learning with Deep Energy-Based Policies.pdf}
-}
-
-@misc{haarnojaSoftActorCriticOffPolicy2018,
-  title = {Soft {{Actor-Critic}}: {{Off-Policy Maximum Entropy Deep Reinforcement Learning}} with a {{Stochastic Actor}}},
-  shorttitle = {Soft {{Actor-Critic}}},
-  author = {Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey},
-  year = {2018},
-  month = aug,
-  number = {arXiv:1801.01290},
-  eprint = {1801.01290},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1801.01290},
-  urldate = {2025-08-29},
-  abstract = {Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/HG6UQIRM/Haarnoja et al. - 2018 - Soft Actor-Critic Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.pdf;/Users/fracapuano/Zotero/storage/RKG3J7MX/1801.html}
-}
-
-@misc{hansenTemporalDifferenceLearning2022,
-  title = {Temporal {{Difference Learning}} for {{Model Predictive Control}}},
-  author = {Hansen, Nicklas and Wang, Xiaolong and Su, Hao},
-  year = {2022},
-  month = jul,
-  number = {arXiv:2203.04955},
-  eprint = {2203.04955},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2203.04955},
-  urldate = {2025-08-25},
-  abstract = {Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and video results are available at https://nicklashansen.github.io/td-mpc.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/TZF8LCDG/Hansen et al. - 2022 - Temporal Difference Learning for Model Predictive Control.pdf;/Users/fracapuano/Zotero/storage/WU2WWWQE/2203.html}
-}
-
-@misc{heessEmergenceLocomotionBehaviours2017,
-  title = {Emergence of {{Locomotion Behaviours}} in {{Rich Environments}}},
-  author = {Heess, Nicolas and TB, Dhruva and Sriram, Srinivasan and Lemmon, Jay and Merel, Josh and Wayne, Greg and Tassa, Yuval and Erez, Tom and Wang, Ziyu and Eslami, S. M. Ali and Riedmiller, Martin and Silver, David},
-  year = {2017},
-  month = jul,
-  number = {arXiv:1707.02286},
-  eprint = {1707.02286},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1707.02286},
-  urldate = {2025-09-02},
-  abstract = {The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion -- behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed following https://youtu.be/hx\_bgoTF7bs .},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence},
-  file = {/Users/fracapuano/Zotero/storage/9DZ8XEVY/Heess et al. - 2017 - Emergence of Locomotion Behaviours in Rich Environments.pdf;/Users/fracapuano/Zotero/storage/JUB2Q3WH/1707.html}
-}
-
-@inproceedings{higgins2017beta,
-  title = {Beta-Vae: {{Learning}} Basic Visual Concepts with a Constrained Variational Framework},
-  booktitle = {International Conference on Learning Representations},
-  author = {Higgins, Irina and Matthey, Loic and Pal, Arka and Burgess, Christopher and Glorot, Xavier and Botvinick, Matthew and Mohamed, Shakir and Lerchner, Alexander},
-  year = {2017}
-}
-
-@misc{hoDenoisingDiffusionProbabilistic2020,
-  title = {Denoising {{Diffusion Probabilistic Models}}},
-  author = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
-  year = {2020},
-  month = dec,
-  number = {arXiv:2006.11239},
-  eprint = {2006.11239},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2006.11239},
-  urldate = {2025-09-03},
-  abstract = {We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/DE655AYQ/Ho et al. - 2020 - Denoising Diffusion Probabilistic Models.pdf;/Users/fracapuano/Zotero/storage/NVIS47ZH/2006.html}
-}
-
-@article{hwangboLearningAgileDynamic2019,
-  title = {Learning Agile and Dynamic Motor Skills for Legged Robots},
-  author = {Hwangbo, Jemin and Lee, Joonho and Dosovitskiy, Alexey and Bellicoso, Dario and Tsounis, Vassilios and Koltun, Vladlen and Hutter, Marco},
-  year = {2019},
-  month = jan,
-  journal = {Science Robotics},
-  volume = {4},
-  number = {26},
-  pages = {eaau5872},
-  publisher = {American Association for the Advancement of Science},
-  doi = {10.1126/scirobotics.aau5872},
-  urldate = {2025-08-27},
-  abstract = {Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog--sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.},
-  file = {/Users/fracapuano/Zotero/storage/9V3X2F7R/Hwangbo et al. - 2019 - Learning agile and dynamic motor skills for legged robots.pdf}
-}
-
-@inproceedings{ImageNet_VSS09,
-  title = {Construction and Analysis of a Large Scale Image Ontology},
-  author = {Deng, J. and Li, K. and Do, M. and Su, H. and {Fei-Fei}, L.},
-  year = {2009},
-  publisher = {Vision Sciences Society}
-}
-
-@inproceedings{InstructBLIP,
-  title = {{{InstructBLIP}}: {{Towards}} General-Purpose Vision-Language Models with Instruction Tuning},
-  booktitle = {Thirty-Seventh Conference on Neural Information Processing Systems},
-  author = {Dai, Wenliang and Li, Junnan and Li, Dongxu and Tiong, Anthony and Zhao, Junqi and Wang, Weisheng and Li, Boyang and Fung, Pascale and Hoi, Steven},
-  year = {2023}
-}
-
-@misc{jangBCZZeroShotTask2022,
-  title = {{{BC-Z}}: {{Zero-Shot Task Generalization}} with {{Robotic Imitation Learning}}},
-  shorttitle = {{{BC-Z}}},
-  author = {Jang, Eric and Irpan, Alex and Khansari, Mohi and Kappler, Daniel and Ebert, Frederik and Lynch, Corey and Levine, Sergey and Finn, Chelsea},
-  year = {2022},
-  month = feb,
-  number = {arXiv:2202.02005},
-  eprint = {2202.02005},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2202.02005},
-  urldate = {2025-09-01},
-  abstract = {In this paper, we study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks, a long-standing challenge in robot learning. We approach the challenge from an imitation learning perspective, aiming to study how scaling and broadening the data collected can facilitate such generalization. To that end, we develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions and can be conditioned on different forms of information that convey the task, including pre-trained embeddings of natural language or videos of humans performing the task. When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44\%, without any robot demonstrations for those tasks.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/YDG2WMDC/Jang et al. - 2022 - BC-Z Zero-Shot Task Generalization with Robotic Imitation Learning.pdf;/Users/fracapuano/Zotero/storage/ZZ47RG6V/2202.html}
-}
-
-@misc{jannerPlanningDiffusionFlexible2022,
-  title = {Planning with {{Diffusion}} for {{Flexible Behavior Synthesis}}},
-  author = {Janner, Michael and Du, Yilun and Tenenbaum, Joshua B. and Levine, Sergey},
-  year = {2022},
-  month = dec,
-  number = {arXiv:2205.09991},
-  eprint = {2205.09991},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2205.09991},
-  urldate = {2025-09-03},
-  abstract = {Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/6S28T733/Janner et al. - 2022 - Planning with Diffusion for Flexible Behavior Synthesis.pdf;/Users/fracapuano/Zotero/storage/DRH9ZWCG/2205.html}
-}
-
-@misc{jiangMistral7B2023,
-  title = {Mistral {{7B}}},
-  author = {Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and Lavaud, L{\'e}lio Renard and Lachaux, Marie-Anne and Stock, Pierre and Scao, Teven Le and Lavril, Thibaut and Wang, Thomas and Lacroix, Timoth{\'e}e and Sayed, William El},
-  year = {2023},
-  month = oct,
-  number = {arXiv:2310.06825},
-  eprint = {2310.06825},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2310.06825},
-  urldate = {2025-09-09},
-  abstract = {We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/JJX9Q8J9/Jiang et al. - 2023 - Mistral 7B.pdf;/Users/fracapuano/Zotero/storage/WTMQBRW3/2310.html}
-}
-
-@misc{jiDribbleBotDynamicLegged2023,
-  title = {{{DribbleBot}}: {{Dynamic Legged Manipulation}} in the {{Wild}}},
-  shorttitle = {{{DribbleBot}}},
-  author = {Ji, Yandong and Margolis, Gabriel B. and Agrawal, Pulkit},
-  year = {2023},
-  month = apr,
-  number = {arXiv:2304.01159},
-  eprint = {2304.01159},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2304.01159},
-  urldate = {2025-08-26},
-  abstract = {DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i.e., in-the-wild). We adopt the paradigm of training policies in simulation using reinforcement learning and transferring them into the real world. We overcome critical challenges of accounting for variable ball motion dynamics on different terrains and perceiving the ball using body-mounted cameras under the constraints of onboard computing. Our results provide evidence that current quadruped platforms are well-suited for studying dynamic whole-body control problems involving simultaneous locomotion and manipulation directly from sensory observations.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/ABSRE4C4/Ji et al. - 2023 - DribbleBot Dynamic Legged Manipulation in the Wild.pdf;/Users/fracapuano/Zotero/storage/ADI4QNCY/2304.html}
-}
-
-@misc{kakaobrain2022coyo700m,
-  title = {{{COYO-700M}}: {{Image-text}} Pair Dataset},
-  author = {Byeon, Minwoo and Park, Beomhee and Kim, Haecheon and Lee, Sungjun and Baek, Woonhyuk and Kim, Saehoon},
-  year = {2022}
-}
-
-@misc{kaplanScalingLawsNeural2020,
-  title = {Scaling {{Laws}} for {{Neural Language Models}}},
-  author = {Kaplan, Jared and McCandlish, Sam and Henighan, Tom and Brown, Tom B. and Chess, Benjamin and Child, Rewon and Gray, Scott and Radford, Alec and Wu, Jeffrey and Amodei, Dario},
-  year = {2020},
-  month = jan,
-  number = {arXiv:2001.08361},
-  eprint = {2001.08361},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2001.08361},
-  urldate = {2025-09-07},
-  abstract = {We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/MI5AGWBH/Kaplan et al. - 2020 - Scaling Laws for Neural Language Models.pdf;/Users/fracapuano/Zotero/storage/SBZT8DDY/2001.html}
-}
-
-@misc{keGraspingChopsticksCombating2020,
-  title = {Grasping with {{Chopsticks}}: {{Combating Covariate Shift}} in {{Model-free Imitation Learning}} for {{Fine Manipulation}}},
-  shorttitle = {Grasping with {{Chopsticks}}},
-  author = {Ke, Liyiming and Wang, Jingqiang and Bhattacharjee, Tapomayukh and Boots, Byron and Srinivasa, Siddhartha},
-  year = {2020},
-  month = nov,
-  number = {arXiv:2011.06719},
-  eprint = {2011.06719},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2011.06719},
-  urldate = {2025-09-01},
-  abstract = {Billions of people use chopsticks, a simple yet versatile tool, for fine manipulation of everyday objects. The small, curved, and slippery tips of chopsticks pose a challenge for picking up small objects, making them a suitably complex test case. This paper leverages human demonstrations to develop an autonomous chopsticks-equipped robotic manipulator. Due to the lack of accurate models for fine manipulation, we explore model-free imitation learning, which traditionally suffers from the covariate shift phenomenon that causes poor generalization. We propose two approaches to reduce covariate shift, neither of which requires access to an interactive expert or a model, unlike previous approaches. First, we alleviate single-step prediction errors by applying an invariant operator to increase the data support at critical steps for grasping. Second, we generate synthetic corrective labels by adding bounded noise and combining parametric and non-parametric methods to prevent error accumulation. We demonstrate our methods on a real chopstick-equipped robot that we built, and observe the agent's success rate increase from 37.3\% to 80\%, which is comparable to the human expert performance of 82.6\%.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/ZUPECLSW/Ke et al. - 2020 - Grasping with Chopsticks Combating Covariate Shift in Model-free Imitation Learning for Fine Manipu.pdf;/Users/fracapuano/Zotero/storage/X7PX638S/2011.html}
-}
-
-@article{khatibRealTimeObstancleAvoidance1986,
-  title = {Real-{{Time Obstancle Avoidance}} for {{Manipulators}} and {{Mobile Robots}}},
-  author = {Khatib, Oussama},
-  year = {1986},
-  journal = {The International Journal of Robotics Research},
-  volume = {5}
-}
-
-@misc{khazatskyDROIDLargeScaleInTheWild2025,
-  title = {{{DROID}}: {{A Large-Scale In-The-Wild Robot Manipulation Dataset}}},
-  shorttitle = {{{DROID}}},
-  author = {Khazatsky, Alexander and Pertsch, Karl and Nair, Suraj and Balakrishna, Ashwin and Dasari, Sudeep and Karamcheti, Siddharth and Nasiriany, Soroush and Srirama, Mohan Kumar and Chen, Lawrence Yunliang and Ellis, Kirsty and Fagan, Peter David and Hejna, Joey and Itkina, Masha and Lepert, Marion and Ma, Yecheng Jason and Miller, Patrick Tree and Wu, Jimmy and Belkhale, Suneel and Dass, Shivin and Ha, Huy and Jain, Arhan and Lee, Abraham and Lee, Youngwoon and Memmel, Marius and Park, Sungjae and Radosavovic, Ilija and Wang, Kaiyuan and Zhan, Albert and Black, Kevin and Chi, Cheng and Hatch, Kyle Beltran and Lin, Shan and Lu, Jingpei and Mercat, Jean and Rehman, Abdul and Sanketi, Pannag R. and Sharma, Archit and Simpson, Cody and Vuong, Quan and Walke, Homer Rich and Wulfe, Blake and Xiao, Ted and Yang, Jonathan Heewon and Yavary, Arefeh and Zhao, Tony Z. and Agia, Christopher and Baijal, Rohan and Castro, Mateo Guaman and Chen, Daphne and Chen, Qiuyu and Chung, Trinity and Drake, Jaimyn and Foster, Ethan Paul and Gao, Jensen and Guizilini, Vitor and Herrera, David Antonio and Heo, Minho and Hsu, Kyle and Hu, Jiaheng and Irshad, Muhammad Zubair and Jackson, Donovon and Le, Charlotte and Li, Yunshuang and Lin, Kevin and Lin, Roy and Ma, Zehan and Maddukuri, Abhiram and Mirchandani, Suvir and Morton, Daniel and Nguyen, Tony and O'Neill, Abigail and Scalise, Rosario and Seale, Derick and Son, Victor and Tian, Stephen and Tran, Emi and Wang, Andrew E. and Wu, Yilin and Xie, Annie and Yang, Jingyun and Yin, Patrick and Zhang, Yunchu and Bastani, Osbert and Berseth, Glen and Bohg, Jeannette and Goldberg, Ken and Gupta, Abhinav and Gupta, Abhishek and Jayaraman, Dinesh and Lim, Joseph J. and Malik, Jitendra and {Mart{\'i}n-Mart{\'i}n}, Roberto and Ramamoorthy, Subramanian and Sadigh, Dorsa and Song, Shuran and Wu, Jiajun and Yip, Michael C. and Zhu, Yuke and Kollar, Thomas and Levine, Sergey and Finn, Chelsea},
-  year = {2025},
-  month = apr,
-  number = {arXiv:2403.12945},
-  eprint = {2403.12945},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2403.12945},
-  urldate = {2025-09-08},
-  abstract = {The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/XZ5Y4HZS/Khazatsky et al. - 2025 - DROID A Large-Scale In-The-Wild Robot Manipulation Dataset.pdf;/Users/fracapuano/Zotero/storage/N2Z72XLK/2403.html}
-}
-
-@misc{kimOpenVLAOpenSourceVisionLanguageAction2024,
-  title = {{{OpenVLA}}: {{An Open-Source Vision-Language-Action Model}}},
-  shorttitle = {{{OpenVLA}}},
-  author = {Kim, Moo Jin and Pertsch, Karl and Karamcheti, Siddharth and Xiao, Ted and Balakrishna, Ashwin and Nair, Suraj and Rafailov, Rafael and Foster, Ethan and Lam, Grace and Sanketi, Pannag and Vuong, Quan and Kollar, Thomas and Burchfiel, Benjamin and Tedrake, Russ and Sadigh, Dorsa and Levine, Sergey and Liang, Percy and Finn, Chelsea},
-  year = {2024},
-  month = sep,
-  number = {arXiv:2406.09246},
-  eprint = {2406.09246},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2406.09246},
-  urldate = {2025-09-08},
-  abstract = {Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has been challenging as 1) existing VLAs are largely closed and inaccessible to the public, and 2) prior work fails to explore methods for efficiently fine-tuning VLAs for new tasks, a key component for adoption. Addressing these challenges, we introduce OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations. OpenVLA builds on a Llama 2 language model combined with a visual encoder that fuses pretrained features from DINOv2 and SigLIP. As a product of the added data diversity and new model components, OpenVLA demonstrates strong results for generalist manipulation, outperforming closed models such as RT-2-X (55B) by 16.5\% in absolute task success rate across 29 tasks and multiple robot embodiments, with 7x fewer parameters. We further show that we can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities, and outperform expressive from-scratch imitation learning methods such as Diffusion Policy by 20.4\%. We also explore compute efficiency; as a separate contribution, we show that OpenVLA can be fine-tuned on consumer GPUs via modern low-rank adaptation methods and served efficiently via quantization without a hit to downstream success rate. Finally, we release model checkpoints, fine-tuning notebooks, and our PyTorch codebase with built-in support for training VLAs at scale on Open X-Embodiment datasets.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/XR2SX8WG/Kim et al. - 2024 - OpenVLA An Open-Source Vision-Language-Action Model.pdf;/Users/fracapuano/Zotero/storage/63Q96WRV/2406.html}
-}
-
-@misc{kingmaAutoEncodingVariationalBayes2022,
-  title = {Auto-{{Encoding Variational Bayes}}},
-  author = {Kingma, Diederik P. and Welling, Max},
-  year = {2022},
-  month = dec,
-  number = {arXiv:1312.6114},
-  eprint = {1312.6114},
-  primaryclass = {stat},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1312.6114},
-  urldate = {2025-09-02},
-  abstract = {How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/IT7VNQ4U/Kingma and Welling - 2022 - Auto-Encoding Variational Bayes.pdf;/Users/fracapuano/Zotero/storage/HQT22HP5/1312.html}
-}
-
-@misc{knightStandardOpenSO100,
-  title = {Standard {{Open SO-100}} \& {{SO-101 Arms}}},
-  author = {Knight, Rob and Kooijmans, Pepijn and Wolf, Thomas and Alibert, Simon and Aractingi, Michel and Aubakirova, Dana and Zouitine, Adil and Martino, Russi and Palma, Steven and Pascal, Caroline and Cadene, Remi}
-}
-
-@article{koberReinforcementLearningRobotics,
-  title = {Reinforcement {{Learning}} in {{Robotics}}: {{A Survey}}},
-  author = {Kober, Jens and Bagnell, J Andrew and Peters, Jan},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/72PRHGKL/Kober et al. - Reinforcement Learning in Robotics A Survey.pdf}
-}
-
-@inproceedings{kong2024audioflam,
-  title = {Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities},
-  booktitle = {International Conference on Machine Learning},
-  author = {Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Ping, Wei and Valle, Rafael and Catanzaro, Bryan},
-  year = {2024},
-  pages = {25125--25148},
-  publisher = {PMLR}
-}
-
-@misc{kumarRMARapidMotor2021,
-  title = {{{RMA}}: {{Rapid Motor Adaptation}} for {{Legged Robots}}},
-  shorttitle = {{{RMA}}},
-  author = {Kumar, Ashish and Fu, Zipeng and Pathak, Deepak and Malik, Jitendra},
-  year = {2021},
-  month = jul,
-  number = {arXiv:2107.04034},
-  eprint = {2107.04034},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2107.04034},
-  urldate = {2025-08-27},
-  abstract = {Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/TMYICHS6/Kumar et al. - 2021 - RMA Rapid Motor Adaptation for Legged Robots.pdf;/Users/fracapuano/Zotero/storage/TFY2EU8I/2107.html}
-}
-
-@misc{laiActionChunkingConditional2025,
-  title = {Action Chunking as Conditional Policy Compression},
-  author = {Lai, Lucy and Huang, Ann and Gershman, Samuel},
-  year = {2025},
-  month = jun,
-  publisher = {OSF},
-  doi = {10.31234/osf.io/z8yrv_v2},
-  urldate = {2025-09-02},
-  abstract = {Many skills in our everyday lives are learned by sequencing actions towards a desired goal. The action sequence can become a ``chunk'' when individual actions are grouped together and executed as one unit, making them more efficient to store and execute. While chunking has been studied extensively across various domains, a puzzle remains as to why and under what conditions action chunking occurs. To tackle these questions, we develop a model of conditional policy compression---the reduction in cognitive cost by conditioning on an additional source of information---to explain the origin of chunking. We argue that chunking is a result of optimizing the trade-off between reward and conditional policy complexity. Chunking compresses policies when there is temporal structure in the environment that can be leveraged for action selection, reducing the amount of memory necessary to encode the policy. We experimentally confirm our model's predictions, showing that chunking reduces conditional policy complexity and reaction times. Chunking also increases with working memory load, consistent with the hypothesis that the degree of policy compression scales with the scarcity of cognitive resources. Finally, chunking also reduces overall working memory load, freeing cognitive resources for the benefit of other, not-chunked information.},
-  archiveprefix = {OSF},
-  langid = {american},
-  keywords = {action selection,chunking,habits,reinforcement learning,resource-rationality,working memory}
-}
-
-@article{laiActionChunkingConditional2025a,
-  title = {Action Chunking as Conditional Policy Compression},
-  author = {Lai, Lucy and Huang, Ann Z. X. and Gershman, Samuel J.},
-  year = {2025},
-  month = nov,
-  journal = {Cognition},
-  volume = {264},
-  pages = {106201},
-  issn = {1873-7838},
-  doi = {10.1016/j.cognition.2025.106201},
-  abstract = {Many skills in our everyday lives are learned by sequencing actions towards a desired goal. The action sequence can become a "chunk" when individual actions are grouped together and executed as one unit, making them more efficient to store and execute. While chunking has been studied extensively across various domains, a puzzle remains as to why and under what conditions action chunking occurs. To tackle these questions, we develop a model of conditional policy compression-the reduction in cognitive cost by conditioning on an additional source of information-to explain the origin of chunking. We argue that chunking is a result of optimizing the trade-off between reward and conditional policy complexity. Chunking compresses policies when there is temporal structure in the environment that can be leveraged for action selection, reducing the amount of memory necessary to encode the policy. We experimentally confirm our model's predictions, showing that chunking reduces conditional policy complexity and reaction times. Chunking also increases with working memory load, consistent with the hypothesis that the degree of policy compression scales with the scarcity of cognitive resources. Finally, chunking also reduces overall working memory load, freeing cognitive resources for the benefit of other, not-chunked information.},
-  langid = {english},
-  pmid = {40602234},
-  keywords = {Action selection,Adult,Chunking,Cognition,Decision making,Female,Humans,Information bottleneck,Male,Memory Short-Term,Models Psychological,Psychomotor Performance,Reaction Time,Reinforcement learning,Resource rationality,Reward,Young Adult}
-}
-
-@article{LAION-COCO,
-  title = {Laion Coco: 600m Synthetic Captions from Laion2b-En},
-  author = {Schuhmann, C and K{\"o}pf, A and Vencu, R and Coombes, T and Beaumont, R},
-  year = {2022},
-  journal = {URL https://laion.ai/blog/laion-coco}
-}
-
-@misc{laurenconWhatMattersWhen2024,
-  title = {What Matters When Building Vision-Language Models?},
-  author = {Lauren{\c c}on, Hugo and Tronchon, L{\'e}o and Cord, Matthieu and Sanh, Victor},
-  year = {2024},
-  month = may,
-  number = {arXiv:2405.02246},
-  eprint = {2405.02246},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2405.02246},
-  urldate = {2025-09-09},
-  abstract = {The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. To address this issue, we conduct extensive experiments around pre-trained models, architecture choice, data, and training methods. Our consolidation of findings includes the development of Idefics2, an efficient foundational VLM of 8 billion parameters. Idefics2 achieves state-of-the-art performance within its size category across various multimodal benchmarks, and is often on par with models four times its size. We release the model (base, instructed, and chat) along with the datasets created for its training.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/8H6NRPU7/Laurençon et al. - 2024 - What matters when building vision-language models.pdf;/Users/fracapuano/Zotero/storage/H3NETYXA/2405.html}
-}
-
-@misc{leeBehaviorGenerationLatent2024,
-  title = {Behavior {{Generation}} with {{Latent Actions}}},
-  author = {Lee, Seungjae and Wang, Yibin and Etukuru, Haritheja and Kim, H. Jin and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel},
-  year = {2024},
-  month = jun,
-  number = {arXiv:2403.03181},
-  eprint = {2403.03181},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2403.03181},
-  urldate = {2025-08-28},
-  abstract = {Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models called Behavior Transformers (BeT) addresses this by discretizing actions using k-means clustering to capture different modes. However, k-means struggles to scale for high-dimensional action spaces or long sequences, and lacks gradient information, and thus BeT suffers in modeling long-range actions. In this work, we present Vector-Quantized Behavior Transformer (VQ-BeT), a versatile model for behavior generation that handles multimodal action prediction, conditional generation, and partial observations. VQ-BeT augments BeT by tokenizing continuous actions with a hierarchical vector quantization module. Across seven environments including simulated manipulation, autonomous driving, and robotics, VQ-BeT improves on state-of-the-art models such as BeT and Diffusion Policies. Importantly, we demonstrate VQ-BeT's improved ability to capture behavior modes while accelerating inference speed 5x over Diffusion Policies. Videos and code can be found https://sjlee.cc/vq-bet},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/IA93ENCH/Lee et al. - 2024 - Behavior Generation with Latent Actions.pdf;/Users/fracapuano/Zotero/storage/KBVF7GQL/2403.html}
-}
-
-@article{leeLearningQuadrupedalLocomotion2020,
-  title = {Learning {{Quadrupedal Locomotion}} over {{Challenging Terrain}}},
-  author = {Lee, Joonho and Hwangbo, Jemin and Wellhausen, Lorenz and Koltun, Vladlen and Hutter, Marco},
-  year = {2020},
-  month = oct,
-  journal = {Science Robotics},
-  volume = {5},
-  number = {47},
-  eprint = {2010.11251},
-  primaryclass = {cs},
-  pages = {eabc5986},
-  issn = {2470-9476},
-  doi = {10.1126/scirobotics.abc5986},
-  urldate = {2025-08-26},
-  abstract = {Some of the most challenging environments on our planet are accessible to quadrupedal animals but remain out of reach for autonomous machines. Legged locomotion can dramatically expand the operational domains of robotics. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have escalated in complexity while falling short of the generality and robustness of animal locomotion. Here we present a radically robust controller for legged locomotion in challenging natural environments. We present a novel solution to incorporating proprioceptive feedback in locomotion control and demonstrate remarkable zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. It is based on a neural network that acts on a stream of proprioceptive signals. The trained controller has taken two generations of quadrupedal ANYmal robots to a variety of natural environments that are beyond the reach of prior published work in legged locomotion. The controller retains its robustness under conditions that have never been encountered during training: deformable terrain such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work opens new frontiers for robotics and indicates that radical robustness in natural environments can be achieved by training in much simpler domains.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics,Computer Science - Systems and Control,Electrical Engineering and Systems Science - Systems and Control},
-  file = {/Users/fracapuano/Zotero/storage/8B9EF2CE/Lee et al. - 2020 - Learning Quadrupedal Locomotion over Challenging Terrain.pdf}
-}
-
-@misc{lillicrapContinuousControlDeep2019,
-  title = {Continuous Control with Deep Reinforcement Learning},
-  author = {Lillicrap, Timothy P. and Hunt, Jonathan J. and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan},
-  year = {2019},
-  month = jul,
-  number = {arXiv:1509.02971},
-  eprint = {1509.02971},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1509.02971},
-  urldate = {2025-08-31},
-  abstract = {We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/2VN6TMVK/Lillicrap et al. - 2019 - Continuous control with deep reinforcement learning.pdf;/Users/fracapuano/Zotero/storage/4FQ4W5VE/1509.html}
-}
-
-@misc{lillicrapContinuousControlDeep2019a,
-  title = {Continuous Control with Deep Reinforcement Learning},
-  author = {Lillicrap, Timothy P. and Hunt, Jonathan J. and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan},
-  year = {2019},
-  month = jul,
-  number = {arXiv:1509.02971},
-  eprint = {1509.02971},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1509.02971},
-  urldate = {2025-08-31},
-  abstract = {We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/HYMPB9F5/Lillicrap et al. - 2019 - Continuous control with deep reinforcement learning.pdf;/Users/fracapuano/Zotero/storage/EKCXMJNQ/1509.html}
-}
-
-@misc{linVILAPretrainingVisual2024,
-  title = {{{VILA}}: {{On Pre-training}} for {{Visual Language Models}}},
-  shorttitle = {{{VILA}}},
-  author = {Lin, Ji and Yin, Hongxu and Ping, Wei and Lu, Yao and Molchanov, Pavlo and Tao, Andrew and Mao, Huizi and Kautz, Jan and Shoeybi, Mohammad and Han, Song},
-  year = {2024},
-  month = may,
-  number = {arXiv:2312.07533},
-  eprint = {2312.07533},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2312.07533},
-  urldate = {2025-09-09},
-  abstract = {Visual language models (VLMs) rapidly progressed with the recent success of large language models. There have been growing efforts on visual instruction tuning to extend the LLM with visual inputs, but lacks an in-depth study of the visual language pre-training process, where the model learns to perform joint modeling on both modalities. In this work, we examine the design options for VLM pre-training by augmenting LLM towards VLM through step-by-step controllable comparisons. We introduce three main findings: (1) freezing LLMs during pre-training can achieve decent zero-shot performance, but lack in-context learning capability, which requires unfreezing the LLM; (2) interleaved pre-training data is beneficial whereas image-text pairs alone are not optimal; (3) re-blending text-only instruction data to image-text data during instruction fine-tuning not only remedies the degradation of text-only tasks, but also boosts VLM task accuracy. With an enhanced pre-training recipe we build VILA, a Visual Language model family that consistently outperforms the state-of-the-art models, e.g., LLaVA-1.5, across main benchmarks without bells and whistles. Multi-modal pre-training also helps unveil appealing properties of VILA, including multi-image reasoning, enhanced in-context learning, and better world knowledge.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/DNA6AFRL/Lin et al. - 2024 - VILA On Pre-training for Visual Language Models.pdf;/Users/fracapuano/Zotero/storage/K32IJ2A3/2312.html}
-}
-
-@misc{lipmanFlowMatchingGenerative2023,
-  title = {Flow {{Matching}} for {{Generative Modeling}}},
-  author = {Lipman, Yaron and Chen, Ricky T. Q. and {Ben-Hamu}, Heli and Nickel, Maximilian and Le, Matt},
-  year = {2023},
-  month = feb,
-  number = {arXiv:2210.02747},
-  eprint = {2210.02747},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2210.02747},
-  urldate = {2025-09-07},
-  abstract = {We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/YFZTRGJ3/Lipman et al. - 2023 - Flow Matching for Generative Modeling.pdf;/Users/fracapuano/Zotero/storage/QUKPDHWR/2210.html}
-}
-
-@misc{lipmanFlowMatchingGuide2024,
-  title = {Flow {{Matching Guide}} and {{Code}}},
-  author = {Lipman, Yaron and Havasi, Marton and Holderrieth, Peter and Shaul, Neta and Le, Matt and Karrer, Brian and Chen, Ricky T. Q. and {Lopez-Paz}, David and {Ben-Hamu}, Heli and Gat, Itai},
-  year = {2024},
-  month = dec,
-  number = {arXiv:2412.06264},
-  eprint = {2412.06264},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2412.06264},
-  urldate = {2025-09-09},
-  abstract = {Flow Matching (FM) is a recent framework for generative modeling that has achieved state-of-the-art performance across various domains, including image, video, audio, speech, and biological structures. This guide offers a comprehensive and self-contained review of FM, covering its mathematical foundations, design choices, and extensions. By also providing a PyTorch package featuring relevant examples (e.g., image and text generation), this work aims to serve as a resource for both novice and experienced researchers interested in understanding, applying and further developing FM.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/6MGQ5AZ2/Lipman et al. - 2024 - Flow Matching Guide and Code.pdf;/Users/fracapuano/Zotero/storage/IKHZ75PU/2412.html}
-}
-
-@article{liu2024kangaroo,
-  title = {Kangaroo: {{A}} Powerful Video-Language Model Supporting Long-Context Video Input},
-  author = {Liu, Jiajun and Wang, Yibing and Ma, Hanghang and Wu, Xiaoping and Ma, Xiaoqi and Wei, Xiaoming and Jiao, Jianbin and Wu, Enhua and Hu, Jie},
-  year = {2024},
-  journal = {arXiv preprint arXiv:2408.15542},
-  eprint = {2408.15542},
-  archiveprefix = {arXiv}
-}
-
-@inproceedings{LLaVA-1.5,
-  title = {Improved Baselines with Visual Instruction Tuning},
-  booktitle = {{{NeurIPS}} 2023 Workshop on Instruction Tuning and Instruction Following},
-  author = {Liu, Haotian and Li, Chunyuan and Li, Yuheng and Lee, Yong Jae},
-  year = {2023}
-}
-
-@misc{luoPreciseDexterousRobotic2024,
-  title = {Precise and {{Dexterous Robotic Manipulation}} via {{Human-in-the-Loop Reinforcement Learning}}},
-  author = {Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
-  year = {2024},
-  month = oct,
-  number = {arXiv:2410.21845},
-  eprint = {2410.21845},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2410.21845},
-  urldate = {2025-08-28},
-  abstract = {Reinforcement learning (RL) holds great promise for enabling autonomous acquisition of complex robotic manipulation skills, but realizing this potential in real-world settings has been challenging. We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks, including dynamic manipulation, precision assembly, and dual-arm coordination. Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies that achieve near-perfect success rates and fast cycle times within just 1 to 2.5 hours of training. We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution. Through extensive experiments and analysis, we provide insights into the effectiveness of our approach, demonstrating how it learns robust, adaptive policies for both reactive and predictive control strategies. Our results suggest that RL can indeed learn a wide range of complex vision-based manipulation policies directly in the real world within practical training times. We hope this work will inspire a new generation of learned robotic manipulation techniques, benefiting both industrial applications and research advancements. Videos and code are available at our project website https://hil-serl.github.io/.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/LEL37N2D/Luo et al. - 2024 - Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning.pdf;/Users/fracapuano/Zotero/storage/VT83SIPT/2410.html}
-}
-
-@misc{luoSERLSoftwareSuite2025,
-  title = {{{SERL}}: {{A Software Suite}} for {{Sample-Efficient Robotic Reinforcement Learning}}},
-  shorttitle = {{{SERL}}},
-  author = {Luo, Jianlan and Hu, Zheyuan and Xu, Charles and Tan, You Liang and Berg, Jacob and Sharma, Archit and Schaal, Stefan and Finn, Chelsea and Gupta, Abhishek and Levine, Sergey},
-  year = {2025},
-  month = mar,
-  number = {arXiv:2401.16013},
-  eprint = {2401.16013},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2401.16013},
-  urldate = {2025-08-31},
-  abstract = {In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at https://serl-robot.github.io/},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/IFYQTF4K/Luo et al. - 2025 - SERL A Software Suite for Sample-Efficient Robotic Reinforcement Learning.pdf;/Users/fracapuano/Zotero/storage/5B67QZDM/2401.html}
-}
-
-@book{lynchModernRoboticsMechanics2017,
-  title = {Modern {{Robotics}}: {{Mechanics}}, {{Planning}}, and {{Control}}},
-  shorttitle = {Modern {{Robotics}}},
-  author = {Lynch, Kevin M. and Park, Frank C.},
-  year = {2017},
-  month = may,
-  edition = {1},
-  publisher = {Cambridge University Press},
-  doi = {10.1017/9781316661239},
-  urldate = {2025-08-25},
-  abstract = {This introduction to robotics offers a distinct and unified perspective of the mechanics, planning and control of robots. Ideal for self-learning, or for courses, as it assumes only freshman-level physics, ordinary differential equations, linear algebra and a little bit of computing background. Modern Robotics presents the state-of-the-art, screw-theoretic techniques capturing the most salient physical features of a robot in an intuitive geometrical way. With numerous exercises at the end of each chapter, accompanying software written to reinforce the concepts in the book and video lectures aimed at changing the classroom experience, this is the go-to textbook for learning about this fascinating subject.},
-  copyright = {https://www.cambridge.org/core/terms},
-  isbn = {978-1-316-66123-9 978-1-107-15630-2 978-1-316-60984-2},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/S9E6NIQ8/Lynch and Park - 2017 - Modern Robotics Mechanics, Planning, and Control.pdf}
-}
-
-@inproceedings{MAPL,
-  title = {{{MAPL}}: {{Parameter-efficient}} Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting},
-  booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics},
-  author = {Ma{\~n}as, Oscar and Rodriguez Lopez, Pau and Ahmadi, Saba and Nematzadeh, Aida and Goyal, Yash and Agrawal, Aishwarya},
-  editor = {Vlachos, Andreas and Augenstein, Isabelle},
-  year = {2023},
-  month = may,
-  pages = {2523--2548},
-  publisher = {Association for Computational Linguistics},
-  address = {Dubrovnik, Croatia},
-  doi = {10.18653/v1/2023.eacl-main.185},
-  abstract = {Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation spaces of unimodal models using aligned image-text data, and can generalize to unseen VL tasks from just a few in-context examples. The small number of trainable parameters makes MAPL effective at low-data and in-domain learning. Moreover, MAPL's modularity enables easy extension to other pre-trained models. Extensive experiments on several visual question answering and image captioning benchmarks show that MAPL achieves superior or competitive performance compared to similar methods while training orders of magnitude fewer parameters. MAPL can be trained in just a few hours using modest computational resources and public datasets. We release our code and pre-trained model weights at {$<$}a href="https://github.com/oscmansan/mapl"{$>$}https://github.com/oscmansan/mapl{$<$}/a{$>$}.}
-}
-
-@misc{marafiotiSmolVLMRedefiningSmall2025,
-  title = {{{SmolVLM}}: {{Redefining}} Small and Efficient Multimodal Models},
-  shorttitle = {{{SmolVLM}}},
-  author = {Marafioti, Andr{\'e}s and Zohar, Orr and Farr{\'e}, Miquel and Noyan, Merve and Bakouch, Elie and Cuenca, Pedro and Zakka, Cyril and Allal, Loubna Ben and Lozhkov, Anton and Tazi, Nouamane and Srivastav, Vaibhav and Lochner, Joshua and Larcher, Hugo and Morlon, Mathieu and Tunstall, Lewis and von Werra, Leandro and Wolf, Thomas},
-  year = {2025},
-  month = apr,
-  number = {arXiv:2504.05299},
-  eprint = {2504.05299},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2504.05299},
-  urldate = {2025-09-09},
-  abstract = {Large Vision-Language Models (VLMs) deliver exceptional performance but require significant computational resources, limiting their deployment on mobile and edge devices. Smaller VLMs typically mirror design choices of larger models, such as extensive image tokenization, leading to inefficient GPU memory usage and constrained practicality for on-device applications. We introduce SmolVLM, a series of compact multimodal models specifically engineered for resource-efficient inference. We systematically explore architectural configurations, tokenization strategies, and data curation optimized for low computational overhead. Through this, we identify key design choices that yield substantial performance gains on image and video tasks with minimal memory footprints. Our smallest model, SmolVLM-256M, uses less than 1GB GPU memory during inference and outperforms the 300-times larger Idefics-80B model, despite an 18-month development gap. Our largest model, at 2.2B parameters, rivals state-of-the-art VLMs consuming twice the GPU memory. SmolVLM models extend beyond static images, demonstrating robust video comprehension capabilities. Our results emphasize that strategic architectural optimizations, aggressive yet efficient tokenization, and carefully curated training data significantly enhance multimodal performance, facilitating practical, energy-efficient deployments at significantly smaller scales.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/5P2KTYKZ/Marafioti et al. - 2025 - SmolVLM Redefining small and efficient multimodal models.pdf;/Users/fracapuano/Zotero/storage/ILVVMXNG/2504.html}
-}
-
-@misc{margolisRapidLocomotionReinforcement2022,
-  title = {Rapid {{Locomotion}} via {{Reinforcement Learning}}},
-  author = {Margolis, Gabriel B. and Yang, Ge and Paigwar, Kartik and Chen, Tao and Agrawal, Pulkit},
-  year = {2022},
-  month = may,
-  number = {arXiv:2205.02824},
-  eprint = {2205.02824},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2205.02824},
-  urldate = {2025-08-26},
-  abstract = {Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work. Videos of the robot's behaviors are available at: https://agility.csail.mit.edu/},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/URXYM9ZM/Margolis et al. - 2022 - Rapid Locomotion via Reinforcement Learning.pdf;/Users/fracapuano/Zotero/storage/S7PRP8ZT/2205.html}
-}
-
-@misc{margolisWalkTheseWays2022,
-  title = {Walk {{These Ways}}: {{Tuning Robot Control}} for {{Generalization}} with {{Multiplicity}} of {{Behavior}}},
-  shorttitle = {Walk {{These Ways}}},
-  author = {Margolis, Gabriel B. and Agrawal, Pulkit},
-  year = {2022},
-  month = dec,
-  number = {arXiv:2212.03238},
-  eprint = {2212.03238},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2212.03238},
-  urldate = {2025-08-27},
-  abstract = {Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics,Computer Science - Systems and Control,Electrical Engineering and Systems Science - Systems and Control},
-  file = {/Users/fracapuano/Zotero/storage/KPNWQYU7/Margolis and Agrawal - 2022 - Walk These Ways Tuning Robot Control for Generalization with Multiplicity of Behavior.pdf;/Users/fracapuano/Zotero/storage/EVSJWCYV/2212.html}
-}
-
-@misc{mccormacSemanticFusionDense3D2016,
-  title = {{{SemanticFusion}}: {{Dense 3D Semantic Mapping}} with {{Convolutional Neural Networks}}},
-  shorttitle = {{{SemanticFusion}}},
-  author = {McCormac, John and Handa, Ankur and Davison, Andrew and Leutenegger, Stefan},
-  year = {2016},
-  month = sep,
-  number = {arXiv:1609.05130},
-  eprint = {1609.05130},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1609.05130},
-  urldate = {2025-08-28},
-  abstract = {Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need extend beyond geometry and appearence - they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN's semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of approximately 25Hz.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/3ASZ9WL8/McCormac et al. - 2016 - SemanticFusion Dense 3D Semantic Mapping with Convolutional Neural Networks.pdf;/Users/fracapuano/Zotero/storage/VGUFP4FL/1609.html}
-}
-
-@misc{minicmpv2024,
-  title = {{{MiniCPM-v}}: A {{GPT-4V}} Level {{MLLM}} on Your Phone},
-  author = {Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and Chen, Qianyu and Zhou, Huarong and Zou, Zhensheng and Zhang, Haoye and Hu, Shengding and Zheng, Zhi and Zhou, Jie and Cai, Jie and Han, Xu and Zeng, Guoyang and Li, Dahai and Liu, Zhiyuan and Sun, Maosong},
-  year = {2024},
-  eprint = {2408.01800},
-  primaryclass = {cs.CV},
-  archiveprefix = {arXiv}
-}
-
-@inproceedings{MMC4,
-  title = {Multimodal {{C4}}: {{An}} Open, Billion-Scale Corpus of Images Interleaved with Text},
-  booktitle = {Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
-  author = {Zhu, Wanrong and Hessel, Jack and Awadalla, Anas and Gadre, Samir Yitzhak and Dodge, Jesse and Fang, Alex and Yu, Youngjae and Schmidt, Ludwig and Wang, William Yang and Choi, Yejin},
-  year = {2023}
-}
-
-@misc{mnihPlayingAtariDeep2013,
-  title = {Playing {{Atari}} with {{Deep Reinforcement Learning}}},
-  author = {Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Graves, Alex and Antonoglou, Ioannis and Wierstra, Daan and Riedmiller, Martin},
-  year = {2013},
-  month = dec,
-  number = {arXiv:1312.5602},
-  eprint = {1312.5602},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1312.5602},
-  urldate = {2025-08-31},
-  abstract = {We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/WVHMEBJ5/Mnih et al. - 2013 - Playing Atari with Deep Reinforcement Learning.pdf;/Users/fracapuano/Zotero/storage/MQIFGTV7/1312.html}
-}
-
-@misc{moondream,
-  title = {Moondream},
-  author = {Korrapati, Vik},
-  year = {2024},
-  howpublished = {Online}
-}
-
-@article{mooreRobotsNuclearPower,
-  title = {Robots for Nuclear Power Plants},
-  author = {Moore, Taylor},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/IMLZMTF3/Moore - Robots for nuclear power plants.pdf}
-}
-
-@misc{nakkiranStepbyStepDiffusionElementary2024,
-  title = {Step-by-{{Step Diffusion}}: {{An Elementary Tutorial}}},
-  shorttitle = {Step-by-{{Step Diffusion}}},
-  author = {Nakkiran, Preetum and Bradley, Arwen and Zhou, Hattie and Advani, Madhu},
-  year = {2024},
-  month = jun,
-  number = {arXiv:2406.08929},
-  eprint = {2406.08929},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2406.08929},
-  urldate = {2025-09-04},
-  abstract = {We present an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/F8X6FZUI/Nakkiran et al. - 2024 - Step-by-Step Diffusion An Elementary Tutorial.pdf;/Users/fracapuano/Zotero/storage/CR78HTMU/2406.html}
-}
-
-@inproceedings{OBELICS,
-  title = {{{OBELICS}}: {{An}} Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents},
-  booktitle = {Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
-  author = {Lauren{\c c}on, Hugo and Saulnier, Lucile and Tronchon, Leo and Bekman, Stas and Singh, Amanpreet and Lozhkov, Anton and Wang, Thomas and Karamcheti, Siddharth and Rush, Alexander M and Kiela, Douwe and Cord, Matthieu and Sanh, Victor},
-  year = {2023}
-}
-
-@misc{openaiGPT4TechnicalReport2024,
-  title = {{{GPT-4 Technical Report}}},
-  author = {OpenAI and Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and Avila, Red and Babuschkin, Igor and Balaji, Suchir and Balcom, Valerie and Baltescu, Paul and Bao, Haiming and Bavarian, Mohammad and Belgum, Jeff and Bello, Irwan and Berdine, Jake and {Bernadett-Shapiro}, Gabriel and Berner, Christopher and Bogdonoff, Lenny and Boiko, Oleg and Boyd, Madelaine and Brakman, Anna-Luisa and Brockman, Greg and Brooks, Tim and Brundage, Miles and Button, Kevin and Cai, Trevor and Campbell, Rosie and Cann, Andrew and Carey, Brittany and Carlson, Chelsea and Carmichael, Rory and Chan, Brooke and Chang, Che and Chantzis, Fotis and Chen, Derek and Chen, Sully and Chen, Ruby and Chen, Jason and Chen, Mark and Chess, Ben and Cho, Chester and Chu, Casey and Chung, Hyung Won and Cummings, Dave and Currier, Jeremiah and Dai, Yunxing and Decareaux, Cory and Degry, Thomas and Deutsch, Noah and Deville, Damien and Dhar, Arka and Dohan, David and Dowling, Steve and Dunning, Sheila and Ecoffet, Adrien and Eleti, Atty and Eloundou, Tyna and Farhi, David and Fedus, Liam and Felix, Niko and Fishman, Sim{\'o}n Posada and Forte, Juston and Fulford, Isabella and Gao, Leo and Georges, Elie and Gibson, Christian and Goel, Vik and Gogineni, Tarun and Goh, Gabriel and {Gontijo-Lopes}, Rapha and Gordon, Jonathan and Grafstein, Morgan and Gray, Scott and Greene, Ryan and Gross, Joshua and Gu, Shixiang Shane and Guo, Yufei and Hallacy, Chris and Han, Jesse and Harris, Jeff and He, Yuchen and Heaton, Mike and Heidecke, Johannes and Hesse, Chris and Hickey, Alan and Hickey, Wade and Hoeschele, Peter and Houghton, Brandon and Hsu, Kenny and Hu, Shengli and Hu, Xin and Huizinga, Joost and Jain, Shantanu and Jain, Shawn and Jang, Joanne and Jiang, Angela and Jiang, Roger and Jin, Haozhun and Jin, Denny and Jomoto, Shino and Jonn, Billie and Jun, Heewoo and Kaftan, Tomer and Kaiser, {\L}ukasz and Kamali, Ali and Kanitscheider, Ingmar and Keskar, Nitish Shirish and Khan, Tabarak and Kilpatrick, Logan and Kim, Jong Wook and Kim, Christina and Kim, Yongjik and Kirchner, Jan Hendrik and Kiros, Jamie and Knight, Matt and Kokotajlo, Daniel and Kondraciuk, {\L}ukasz and Kondrich, Andrew and Konstantinidis, Aris and Kosic, Kyle and Krueger, Gretchen and Kuo, Vishal and Lampe, Michael and Lan, Ikai and Lee, Teddy and Leike, Jan and Leung, Jade and Levy, Daniel and Li, Chak Ming and Lim, Rachel and Lin, Molly and Lin, Stephanie and Litwin, Mateusz and Lopez, Theresa and Lowe, Ryan and Lue, Patricia and Makanju, Anna and Malfacini, Kim and Manning, Sam and Markov, Todor and Markovski, Yaniv and Martin, Bianca and Mayer, Katie and Mayne, Andrew and McGrew, Bob and McKinney, Scott Mayer and McLeavey, Christine and McMillan, Paul and McNeil, Jake and Medina, David and Mehta, Aalok and Menick, Jacob and Metz, Luke and Mishchenko, Andrey and Mishkin, Pamela and Monaco, Vinnie and Morikawa, Evan and Mossing, Daniel and Mu, Tong and Murati, Mira and Murk, Oleg and M{\'e}ly, David and Nair, Ashvin and Nakano, Reiichiro and Nayak, Rajeev and Neelakantan, Arvind and Ngo, Richard and Noh, Hyeonwoo and Ouyang, Long and O'Keefe, Cullen and Pachocki, Jakub and Paino, Alex and Palermo, Joe and Pantuliano, Ashley and Parascandolo, Giambattista and Parish, Joel and Parparita, Emy and Passos, Alex and Pavlov, Mikhail and Peng, Andrew and Perelman, Adam and Peres, Filipe de Avila Belbute and Petrov, Michael and Pinto, Henrique Ponde de Oliveira and Michael and Pokorny and Pokrass, Michelle and Pong, Vitchyr H. and Powell, Tolly and Power, Alethea and Power, Boris and Proehl, Elizabeth and Puri, Raul and Radford, Alec and Rae, Jack and Ramesh, Aditya and Raymond, Cameron and Real, Francis and Rimbach, Kendra and Ross, Carl and Rotsted, Bob and Roussez, Henri and Ryder, Nick and Saltarelli, Mario and Sanders, Ted and Santurkar, Shibani and Sastry, Girish and Schmidt, Heather and Schnurr, David and Schulman, John and Selsam, Daniel and Sheppard, Kyla and Sherbakov, Toki and Shieh, Jessica and Shoker, Sarah and Shyam, Pranav and Sidor, Szymon and Sigler, Eric and Simens, Maddie and Sitkin, Jordan and Slama, Katarina and Sohl, Ian and Sokolowsky, Benjamin and Song, Yang and Staudacher, Natalie and Such, Felipe Petroski and Summers, Natalie and Sutskever, Ilya and Tang, Jie and Tezak, Nikolas and Thompson, Madeleine B. and Tillet, Phil and Tootoonchian, Amin and Tseng, Elizabeth and Tuggle, Preston and Turley, Nick and Tworek, Jerry and Uribe, Juan Felipe Cer{\'o}n and Vallone, Andrea and Vijayvergiya, Arun and Voss, Chelsea and Wainwright, Carroll and Wang, Justin Jay and Wang, Alvin and Wang, Ben and Ward, Jonathan and Wei, Jason and Weinmann, C. J. and Welihinda, Akila and Welinder, Peter and Weng, Jiayi and Weng, Lilian and Wiethoff, Matt and Willner, Dave and Winter, Clemens and Wolrich, Samuel and Wong, Hannah and Workman, Lauren and Wu, Sherwin and Wu, Jeff and Wu, Michael and Xiao, Kai and Xu, Tao and Yoo, Sarah and Yu, Kevin and Yuan, Qiming and Zaremba, Wojciech and Zellers, Rowan and Zhang, Chong and Zhang, Marvin and Zhao, Shengjia and Zheng, Tianhao and Zhuang, Juntang and Zhuk, William and Zoph, Barret},
-  year = {2024},
-  month = mar,
-  number = {arXiv:2303.08774},
-  eprint = {2303.08774},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2303.08774},
-  urldate = {2025-08-27},
-  abstract = {We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10\% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language},
-  file = {/Users/fracapuano/Zotero/storage/9CJAC5WC/OpenAI et al. - 2024 - GPT-4 Technical Report.pdf;/Users/fracapuano/Zotero/storage/8VS6FA7G/2303.html}
-}
-
-@misc{OpenXEmbodimentRobotic,
-  title = {Open {{X-Embodiment}}: {{Robotic Learning Datasets}} and {{RT-X Models}}},
-  shorttitle = {Open {{X-Embodiment}}},
-  urldate = {2025-08-27},
-  abstract = {Project page for Open X-Embodiment: Robotic Learning Datasets and RT-X Models.},
-  howpublished = {https://robotics-transformer-x.github.io/},
-  file = {/Users/fracapuano/Zotero/storage/5DS9SYCH/robotics-transformer-x.github.io.html}
-}
-
-@misc{oquabDINOv2LearningRobust2024,
-  title = {{{DINOv2}}: {{Learning Robust Visual Features}} without {{Supervision}}},
-  shorttitle = {{{DINOv2}}},
-  author = {Oquab, Maxime and Darcet, Timoth{\'e}e and Moutakanni, Th{\'e}o and Vo, Huy and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and {El-Nouby}, Alaaeldin and Assran, Mahmoud and Ballas, Nicolas and Galuba, Wojciech and Howes, Russell and Huang, Po-Yao and Li, Shang-Wen and Misra, Ishan and Rabbat, Michael and Sharma, Vasu and Synnaeve, Gabriel and Xu, Hu and Jegou, Herv{\'e} and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
-  year = {2024},
-  month = feb,
-  number = {arXiv:2304.07193},
-  eprint = {2304.07193},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2304.07193},
-  urldate = {2025-09-07},
-  abstract = {The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/QUP9C62G/Oquab et al. - 2024 - DINOv2 Learning Robust Visual Features without Supervision.pdf;/Users/fracapuano/Zotero/storage/G5P2WXLM/2304.html}
-}
-
-@misc{permenterInterpretingImprovingDiffusion2024,
-  title = {Interpreting and {{Improving Diffusion Models}} from an {{Optimization Perspective}}},
-  author = {Permenter, Frank and Yuan, Chenyang},
-  year = {2024},
-  month = jun,
-  number = {arXiv:2306.04848},
-  eprint = {2306.04848},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2306.04848},
-  urldate = {2025-09-03},
-  abstract = {Denoising is intuitively related to projection. Indeed, under the manifold hypothesis, adding random noise is approximately equivalent to orthogonal perturbation. Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to interpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. We then provide straight-forward convergence analysis of the DDIM sampler under simple assumptions on the projection error of the denoiser. Finally, we propose a new gradient-estimation sampler, generalizing DDIM using insights from our theoretical results. In as few as 5-10 function evaluations, our sampler achieves state-of-the-art FID scores on pretrained CIFAR-10 and CelebA models and can generate high quality samples on latent diffusion models.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Mathematics - Optimization and Control,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/45F7R93S/Permenter and Yuan - 2024 - Interpreting and Improving Diffusion Models from an Optimization Perspective.pdf;/Users/fracapuano/Zotero/storage/9EAM4RZH/2306.html}
-}
-
-@misc{pieterabbeelL5DDPGSAC2021,
-  title = {L5 {{DDPG}} and {{SAC}} ({{Foundations}} of {{Deep RL Series}})},
-  author = {{Pieter Abbeel}},
-  year = {2021},
-  month = aug,
-  urldate = {2025-09-01},
-  abstract = {Lecture 5 of a 6-lecture series on the Foundations of Deep RL  Topic: Deep Deterministic Policy Gradients (DDPG) and Soft Actor Critic (SAC) Instructor: Pieter Abbeel}
-}
-
-@inproceedings{pmlr-v32-silver14,
-  title = {Deterministic Policy Gradient Algorithms},
-  booktitle = {Proceedings of the 31st International Conference on Machine Learning},
-  author = {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin},
-  editor = {Xing, Eric P. and Jebara, Tony},
-  year = {2014},
-  month = jun,
-  series = {Proceedings of Machine Learning Research},
-  volume = {32},
-  pages = {387--395},
-  publisher = {PMLR},
-  address = {Bejing, China},
-  abstract = {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.}
-}
-
-@misc{PolicyGradientMethods,
-  title = {Policy Gradient Methods for Reinforcement Learning with Function Approximation - {{Google Search}}},
-  urldate = {2025-08-31},
-  howpublished = {https://www.google.com/search?q=Policy+gradient+methods+for+reinforcement+learning+with+function+approximation\&sourceid=chrome\&ie=UTF-8},
-  file = {/Users/fracapuano/Zotero/storage/GRIBG9H8/search.html}
-}
-
-@misc{polyakMovieGenCast2025,
-  title = {Movie {{Gen}}: {{A Cast}} of {{Media Foundation Models}}},
-  shorttitle = {Movie {{Gen}}},
-  author = {Polyak, Adam and Zohar, Amit and Brown, Andrew and Tjandra, Andros and Sinha, Animesh and Lee, Ann and Vyas, Apoorv and Shi, Bowen and Ma, Chih-Yao and Chuang, Ching-Yao and Yan, David and Choudhary, Dhruv and Wang, Dingkang and Sethi, Geet and Pang, Guan and Ma, Haoyu and Misra, Ishan and Hou, Ji and Wang, Jialiang and Jagadeesh, Kiran and Li, Kunpeng and Zhang, Luxin and Singh, Mannat and Williamson, Mary and Le, Matt and Yu, Matthew and Singh, Mitesh Kumar and Zhang, Peizhao and Vajda, Peter and Duval, Quentin and Girdhar, Rohit and Sumbaly, Roshan and Rambhatla, Sai Saketh and Tsai, Sam and Azadi, Samaneh and Datta, Samyak and Chen, Sanyuan and Bell, Sean and Ramaswamy, Sharadh and Sheynin, Shelly and Bhattacharya, Siddharth and Motwani, Simran and Xu, Tao and Li, Tianhe and Hou, Tingbo and Hsu, Wei-Ning and Yin, Xi and Dai, Xiaoliang and Taigman, Yaniv and Luo, Yaqiao and Liu, Yen-Cheng and Wu, Yi-Chiao and Zhao, Yue and Kirstain, Yuval and He, Zecheng and He, Zijian and Pumarola, Albert and Thabet, Ali and Sanakoyeu, Artsiom and Mallya, Arun and Guo, Baishan and Araya, Boris and Kerr, Breena and Wood, Carleigh and Liu, Ce and Peng, Cen and Vengertsev, Dimitry and Schonfeld, Edgar and Blanchard, Elliot and {Juefei-Xu}, Felix and Nord, Fraylie and Liang, Jeff and Hoffman, John and Kohler, Jonas and Fire, Kaolin and Sivakumar, Karthik and Chen, Lawrence and Yu, Licheng and Gao, Luya and Georgopoulos, Markos and Moritz, Rashel and Sampson, Sara K. and Li, Shikai and Parmeggiani, Simone and Fine, Steve and Fowler, Tara and Petrovic, Vladan and Du, Yuming},
-  year = {2025},
-  month = feb,
-  number = {arXiv:2410.13720},
-  eprint = {2410.13720},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2410.13720},
-  urldate = {2025-09-06},
-  abstract = {We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation. Our largest video generation model is a 30B parameter transformer trained with a maximum context length of 73K video tokens, corresponding to a generated video of 16 seconds at 16 frames-per-second. We show multiple technical innovations and simplifications on the architecture, latent spaces, training objectives and recipes, data curation, evaluation protocols, parallelization techniques, and inference optimizations that allow us to reap the benefits of scaling pre-training data, model size, and training compute for training large scale media generation models. We hope this paper helps the research community to accelerate progress and innovation in media generation models. All videos from this paper are available at https://go.fb.me/MovieGenResearchVideos.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning,Electrical Engineering and Systems Science - Image and Video Processing},
-  file = {/Users/fracapuano/Zotero/storage/KGDELBPH/Polyak et al. - 2025 - Movie Gen A Cast of Media Foundation Models.pdf;/Users/fracapuano/Zotero/storage/LV8WPFVU/2410.html}
-}
-
-@inproceedings{pomerleauALVINNAutonomousLand1988,
-  title = {{{ALVINN}}: {{An Autonomous Land Vehicle}} in a {{Neural Network}}},
-  shorttitle = {{{ALVINN}}},
-  booktitle = {Advances in {{Neural Information Processing Systems}}},
-  author = {Pomerleau, Dean A.},
-  year = {1988},
-  volume = {1},
-  publisher = {Morgan-Kaufmann},
-  urldate = {2025-09-03},
-  abstract = {ALVINN (Autonomous Land Vehicle In a Neural Network) is a 3-layer  back-propagation network designed for the task of road following. Cur(cid:173) rently ALVINN takes images from a camera and a laser range finder as input  and produces as output the direction the vehicle should travel in order to  follow the road. Training has been conducted using simulated road images.  Successful tests on the Carnegie Mellon autonomous navigation test vehicle  indicate that the network can effectively follow real roads under certain field  conditions. The representation developed to perfOIm the task differs dra(cid:173) matically when the networlc is trained under various conditions, suggesting  the possibility of a novel adaptive autonomous navigation system capable of  tailoring its processing to the conditions at hand.},
-  file = {/Users/fracapuano/Zotero/storage/BT7UE8MA/Pomerleau - 1988 - ALVINN An Autonomous Land Vehicle in a Neural Network.pdf}
-}
-
-@inproceedings{pomerleauALVINNAutonomousLand1988a,
-  title = {{{ALVINN}}: {{An Autonomous Land Vehicle}} in a {{Neural Network}}},
-  shorttitle = {{{ALVINN}}},
-  booktitle = {Advances in {{Neural Information Processing Systems}}},
-  author = {Pomerleau, Dean A.},
-  year = {1988},
-  volume = {1},
-  publisher = {Morgan-Kaufmann},
-  urldate = {2025-09-01},
-  abstract = {ALVINN (Autonomous Land Vehicle In a Neural Network) is a 3-layer  back-propagation network designed for the task of road following. Cur(cid:173) rently ALVINN takes images from a camera and a laser range finder as input  and produces as output the direction the vehicle should travel in order to  follow the road. Training has been conducted using simulated road images.  Successful tests on the Carnegie Mellon autonomous navigation test vehicle  indicate that the network can effectively follow real roads under certain field  conditions. The representation developed to perfOIm the task differs dra(cid:173) matically when the networlc is trained under various conditions, suggesting  the possibility of a novel adaptive autonomous navigation system capable of  tailoring its processing to the conditions at hand.},
-  file = {/Users/fracapuano/Zotero/storage/P64K7XYH/Pomerleau - 1988 - ALVINN An Autonomous Land Vehicle in a Neural Network.pdf}
-}
-
-@book{prince2023understanding,
-  title = {Understanding Deep Learning},
-  author = {Prince, Simon J.D.},
-  year = {2023},
-  publisher = {The MIT Press}
-}
-
-@misc{radfordLearningTransferableVisual2021,
-  title = {Learning {{Transferable Visual Models From Natural Language Supervision}}},
-  author = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and Krueger, Gretchen and Sutskever, Ilya},
-  year = {2021},
-  month = feb,
-  number = {arXiv:2103.00020},
-  eprint = {2103.00020},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2103.00020},
-  urldate = {2025-09-09},
-  abstract = {State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/9RAM5ZIE/Radford et al. - 2021 - Learning Transferable Visual Models From Natural Language Supervision.pdf;/Users/fracapuano/Zotero/storage/YIEJ6PCB/2103.html}
-}
-
-@misc{raffelExploringLimitsTransfer2023,
-  title = {Exploring the {{Limits}} of {{Transfer Learning}} with a {{Unified Text-to-Text Transformer}}},
-  author = {Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J.},
-  year = {2023},
-  month = sep,
-  number = {arXiv:1910.10683},
-  eprint = {1910.10683},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1910.10683},
-  urldate = {2025-09-07},
-  abstract = {Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/F7VN7TZA/Raffel et al. - 2023 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.pdf;/Users/fracapuano/Zotero/storage/YALEE6N9/1910.html}
-}
-
-@misc{reedGeneralistAgent2022,
-  title = {A {{Generalist Agent}}},
-  author = {Reed, Scott and Zolna, Konrad and Parisotto, Emilio and Colmenarejo, Sergio Gomez and Novikov, Alexander and {Barth-Maron}, Gabriel and Gimenez, Mai and Sulsky, Yury and Kay, Jackie and Springenberg, Jost Tobias and Eccles, Tom and Bruce, Jake and Razavi, Ali and Edwards, Ashley and Heess, Nicolas and Chen, Yutian and Hadsell, Raia and Vinyals, Oriol and Bordbar, Mahyar and de Freitas, Nando},
-  year = {2022},
-  month = nov,
-  number = {arXiv:2205.06175},
-  eprint = {2205.06175},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2205.06175},
-  urldate = {2025-09-07},
-  abstract = {Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/VDNMGQB4/Reed et al. - 2022 - A Generalist Agent.pdf;/Users/fracapuano/Zotero/storage/9Y4ZMZIL/2205.html}
-}
-
-@misc{ronnebergerUNetConvolutionalNetworks2015,
-  title = {U-{{Net}}: {{Convolutional Networks}} for {{Biomedical Image Segmentation}}},
-  shorttitle = {U-{{Net}}},
-  author = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
-  year = {2015},
-  month = may,
-  number = {arXiv:1505.04597},
-  eprint = {1505.04597},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1505.04597},
-  urldate = {2025-09-06},
-  abstract = {There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/7H54LXUZ/Ronneberger et al. - 2015 - U-Net Convolutional Networks for Biomedical Image Segmentation.pdf;/Users/fracapuano/Zotero/storage/4NZ6ZRGI/1505.html}
-}
-
-@misc{rossReductionImitationLearning2011,
-  title = {A {{Reduction}} of {{Imitation Learning}} and {{Structured Prediction}} to {{No-Regret Online Learning}}},
-  author = {Ross, Stephane and Gordon, Geoffrey J. and Bagnell, J. Andrew},
-  year = {2011},
-  month = mar,
-  number = {arXiv:1011.0686},
-  eprint = {1011.0686},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1011.0686},
-  urldate = {2025-09-02},
-  abstract = {Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/PFDE9IUH/Ross et al. - 2011 - A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.pdf;/Users/fracapuano/Zotero/storage/7VA6XGEA/1011.html}
-}
-
-@misc{sannemanStateIndustrialRobotics2020,
-  title = {The {{State}} of {{Industrial Robotics}}: {{Emerging Technologies}}, {{Challenges}}, and {{Key Research Directions}}},
-  shorttitle = {The {{State}} of {{Industrial Robotics}}},
-  author = {Sanneman, Lindsay and Fourie, Christopher and Shah, Julie A.},
-  year = {2020},
-  month = oct,
-  number = {arXiv:2010.14537},
-  eprint = {2010.14537},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2010.14537},
-  urldate = {2025-08-26},
-  abstract = {Robotics and related technologies are central to the ongoing digitization and advancement of manufacturing. In recent years, a variety of strategic initiatives around the world including "Industry 4.0", introduced in Germany in 2011 have aimed to improve and connect manufacturing technologies in order to optimize production processes. In this work, we study the changing technological landscape of robotics and "internet-of-things" (IoT)-based connective technologies over the last 7-10 years in the wake of Industry 4.0. We interviewed key players within the European robotics ecosystem, including robotics manufacturers and integrators, original equipment manufacturers (OEMs), and applied industrial research institutions and synthesize our findings in this paper. We first detail the state-of-the-art robotics and IoT technologies we observed and that the companies discussed during our interviews. We then describe the processes the companies follow when deciding whether and how to integrate new technologies, the challenges they face when integrating these technologies, and some immediate future technological avenues they are exploring in robotics and IoT. Finally, based on our findings, we highlight key research directions for the robotics community that can enable improved capabilities in the context of manufacturing.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/8ETI44WZ/Sanneman et al. - 2020 - The State of Industrial Robotics Emerging Technologies, Challenges, and Key Research Directions.pdf;/Users/fracapuano/Zotero/storage/Y37S4WE2/2010.html}
-}
-
-@misc{ScholargoogleusercontentcomScholarbibqinfo88G_QluoYI4J,
-  title = {Scholar.Googleusercontent.Com/Scholar.Bib?Q=info:{{88G}}\_{{QluoYI4J}}:Scholar.Google.Com/\&output=citation\&scisdr={{CgIQg4SNEO7moXYtjoc}}:{{AAZF9b8AAAAAaLQrlocZcsFJirMs3WpUvW3zxvM}}\&scisig={{AAZF9b8AAAAAaLQrlgE-ix1Lq0FaNEP0Mj37mGU}}\&scisf=4\&ct=citation\&cd=-1\&hl=en},
-  urldate = {2025-08-31},
-  howpublished = {https://scholar.googleusercontent.com/scholar.bib?q=info:88G\_QluoYI4J:scholar.google.com/\&output=citation\&scisdr=CgIQg4SNEO7moXYtjoc:AAZF9b8AAAAAaLQrlocZcsFJirMs3WpUvW3zxvM\&scisig=AAZF9b8AAAAAaLQrlgE-ix1Lq0FaNEP0Mj37mGU\&scisf=4\&ct=citation\&cd=-1\&hl=en},
-  file = {/Users/fracapuano/Zotero/storage/9DKD7T9B/scholar.html}
-}
-
-@misc{schulmanProximalPolicyOptimization2017,
-  title = {Proximal {{Policy Optimization Algorithms}}},
-  author = {Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg},
-  year = {2017},
-  month = aug,
-  number = {arXiv:1707.06347},
-  eprint = {1707.06347},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1707.06347},
-  urldate = {2025-08-29},
-  abstract = {We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/DGQ79LDQ/Schulman et al. - 2017 - Proximal Policy Optimization Algorithms.pdf;/Users/fracapuano/Zotero/storage/ISS4QTB9/1707.html}
-}
-
-@misc{schulmanTrustRegionPolicy2017,
-  title = {Trust {{Region Policy Optimization}}},
-  author = {Schulman, John and Levine, Sergey and Moritz, Philipp and Jordan, Michael I. and Abbeel, Pieter},
-  year = {2017},
-  month = apr,
-  number = {arXiv:1502.05477},
-  eprint = {1502.05477},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1502.05477},
-  urldate = {2025-08-29},
-  abstract = {We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/MC469UHX/Schulman et al. - 2017 - Trust Region Policy Optimization.pdf;/Users/fracapuano/Zotero/storage/V7M6LZV3/1502.html}
-}
-
-@book{shalev-shwartzUnderstandingMachineLearning2014,
-  title = {Understanding {{Machine Learning}}: {{From Theory}} to {{Algorithms}}},
-  shorttitle = {Understanding {{Machine Learning}}},
-  author = {{Shalev-Shwartz}, Shai and {Ben-David}, Shai},
-  year = {2014},
-  month = may,
-  edition = {1},
-  publisher = {Cambridge University Press},
-  doi = {10.1017/CBO9781107298019},
-  urldate = {2025-09-01},
-  abstract = {Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides a theoretical account of the fundamentals underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics, the book covers a wide array of central topics unaddressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for advanced undergraduates or beginning graduates, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics and engineering.},
-  copyright = {https://www.cambridge.org/core/terms},
-  isbn = {978-1-107-05713-5 978-1-107-29801-9},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/KTKPACDG/Shalev-Shwartz and Ben-David - 2014 - Understanding Machine Learning From Theory to Algorithms.pdf}
-}
-
-@article{shazeerOUTRAGEOUSLYLARGENEURAL2017,
-  title = {{{OUTRAGEOUSLY LARGE NEURAL NETWORKS}}: {{THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER}}},
-  author = {Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc and Dean, Jeff},
-  year = {2017},
-  abstract = {The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/QHJRU8HX/Shazeer et al. - 2017 - OUTRAGEOUSLY LARGE NEURAL NETWORKS THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER.pdf}
-}
-
-@misc{shazeerOutrageouslyLargeNeural2017a,
-  title = {Outrageously {{Large Neural Networks}}: {{The Sparsely-Gated Mixture-of-Experts Layer}}},
-  shorttitle = {Outrageously {{Large Neural Networks}}},
-  author = {Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc and Hinton, Geoffrey and Dean, Jeff},
-  year = {2017},
-  month = jan,
-  number = {arXiv:1701.06538},
-  eprint = {1701.06538},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1701.06538},
-  urldate = {2025-09-08},
-  abstract = {The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computing,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/DJX78PLY/Shazeer et al. - 2017 - Outrageously Large Neural Networks The Sparsely-Gated Mixture-of-Experts Layer.pdf;/Users/fracapuano/Zotero/storage/I4T8DUPG/1701.html}
-}
-
-@inproceedings{shukor2023epalm,
-  title = {Ep-Alm: {{Efficient}} Perceptual Augmentation of Language Models},
-  booktitle = {Proceedings of the {{IEEE}}/{{CVF}} International Conference on Computer Vision},
-  author = {Shukor, Mustafa and Dancette, Corentin and Cord, Matthieu},
-  year = {2023},
-  pages = {22056--22069}
-}
-
-@misc{shukorSmolVLAVisionLanguageActionModel2025,
-  title = {{{SmolVLA}}: {{A Vision-Language-Action Model}} for {{Affordable}} and {{Efficient Robotics}}},
-  shorttitle = {{{SmolVLA}}},
-  author = {Shukor, Mustafa and Aubakirova, Dana and Capuano, Francesco and Kooijmans, Pepijn and Palma, Steven and Zouitine, Adil and Aractingi, Michel and Pascal, Caroline and Russi, Martino and Marafioti, Andres and Alibert, Simon and Cord, Matthieu and Wolf, Thomas and Cadene, Remi},
-  year = {2025},
-  month = jun,
-  number = {arXiv:2506.01844},
-  eprint = {2506.01844},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2506.01844},
-  urldate = {2025-08-28},
-  abstract = {Vision-language models (VLMs) pretrained on large-scale multimodal datasets encode rich visual and linguistic knowledge, making them a strong foundation for robotics. Rather than training robotic policies from scratch, recent approaches adapt VLMs into vision-language-action (VLA) models that enable natural language-driven perception and control. However, existing VLAs are typically massive--often with billions of parameters--leading to high training costs and limited real-world deployability. Moreover, they rely on academic and industrial datasets, overlooking the growing availability of community-collected data from affordable robotic platforms. In this work, we present SmolVLA, a small, efficient, and community-driven VLA that drastically reduces both training and inference costs, while retaining competitive performance. SmolVLA is designed to be trained on a single GPU and deployed on consumer-grade GPUs or even CPUs. To further improve responsiveness, we introduce an asynchronous inference stack decoupling perception and action prediction from action execution, allowing higher control rates with chunked action generation. Despite its compact size, SmolVLA achieves performance comparable to VLAs that are 10x larger. We evaluate SmolVLA on a range of both simulated as well as real-world robotic benchmarks and release all code, pretrained models, and training data.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/Y64M6XLX/Shukor et al. - 2025 - SmolVLA A Vision-Language-Action Model for Affordable and Efficient Robotics.pdf;/Users/fracapuano/Zotero/storage/FNNQTK8Q/2506.html}
-}
-
-@book{sicilianoSpringerHandbookRobotics2016,
-  title = {Springer {{Handbook}} of {{Robotics}}},
-  editor = {Siciliano, Bruno and Khatib, Oussama},
-  year = {2016},
-  series = {Springer {{Handbooks}}},
-  publisher = {Springer International Publishing},
-  address = {Cham},
-  doi = {10.1007/978-3-319-32552-1},
-  urldate = {2025-08-26},
-  copyright = {https://www.springer.com/tdm},
-  isbn = {978-3-319-32550-7 978-3-319-32552-1},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/JHG94GYG/Siciliano and Khatib - 2016 - Springer Handbook of Robotics.pdf}
-}
-
-@misc{SignYourAccount,
-  title = {Sign in to Your Account},
-  urldate = {2025-09-02},
-  howpublished = {https://login.microsoftonline.com/cc95de1b-97f5-4f93-b4ba-fe68b852cf91/login},
-  file = {/Users/fracapuano/Zotero/storage/AP6JNKS8/login.html}
-}
-
-@article{silverDeterministicPolicyGradient,
-  title = {Deterministic {{Policy Gradient Algorithms}}},
-  author = {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin},
-  abstract = {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/IMFSXA3G/Silver et al. - Deterministic Policy Gradient Algorithms.pdf}
-}
-
-@inproceedings{silverDeterministicPolicyGradient2014,
-  title = {Deterministic {{Policy Gradient Algorithms}}},
-  booktitle = {Proceedings of the 31st {{International Conference}} on {{Machine Learning}}},
-  author = {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin},
-  year = {2014},
-  month = jan,
-  pages = {387--395},
-  publisher = {PMLR},
-  issn = {1938-7228},
-  urldate = {2025-08-31},
-  abstract = {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/YI9JNYPV/Silver et al. - 2014 - Deterministic Policy Gradient Algorithms.pdf}
-}
-
-@article{silverDeterministicPolicyGradienta,
-  title = {Deterministic {{Policy Gradient Algorithms}}},
-  author = {Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin},
-  abstract = {In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/VWQNLK9R/Silver et al. - Deterministic Policy Gradient Algorithms.pdf}
-}
-
-@misc{sohl-dicksteinDeepUnsupervisedLearning2015,
-  title = {Deep {{Unsupervised Learning}} Using {{Nonequilibrium Thermodynamics}}},
-  author = {{Sohl-Dickstein}, Jascha and Weiss, Eric A. and Maheswaranathan, Niru and Ganguli, Surya},
-  year = {2015},
-  month = nov,
-  number = {arXiv:1503.03585},
-  eprint = {1503.03585},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1503.03585},
-  urldate = {2025-09-04},
-  abstract = {A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Condensed Matter - Disordered Systems and Neural Networks,Quantitative Biology - Neurons and Cognition,Statistics - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/YZ5GBG5Z/Sohl-Dickstein et al. - 2015 - Deep Unsupervised Learning using Nonequilibrium Thermodynamics.pdf;/Users/fracapuano/Zotero/storage/97PKSBVT/1503.html}
-}
-
-@inproceedings{sohnLearningStructuredOutput2015,
-  title = {Learning {{Structured Output Representation}} Using {{Deep Conditional Generative Models}}},
-  booktitle = {Advances in {{Neural Information Processing Systems}}},
-  author = {Sohn, Kihyuk and Lee, Honglak and Yan, Xinchen},
-  year = {2015},
-  volume = {28},
-  publisher = {Curran Associates, Inc.},
-  urldate = {2025-09-02},
-  abstract = {Supervised deep learning has been successfully applied for many recognition problems in machine learning and computer vision. Although it can approximate a complex many-to-one function very well when large number of training data is provided, the lack of probabilistic inference of the current supervised deep learning methods makes it difficult to model a complex structured output representations. In this work, we develop a scalable deep conditional generative model for structured output variables using Gaussian latent variables. The model is trained efficiently in the framework of stochastic gradient variational Bayes, and allows a fast prediction using stochastic feed-forward inference. In addition, we provide novel strategies to build a robust structured prediction algorithms, such as recurrent prediction network architecture, input noise-injection and multi-scale prediction training methods. In experiments, we demonstrate the effectiveness of our proposed algorithm in comparison to the deterministic deep neural network counterparts in generating diverse but realistic output representations using stochastic inference. Furthermore, the proposed schemes in training methods and architecture design were complimentary, which leads to achieve strong pixel-level object segmentation and semantic labeling performance on Caltech-UCSD Birds 200 and the subset of Labeled Faces in the Wild dataset.},
-  file = {/Users/fracapuano/Zotero/storage/T6QP2WB3/Sohn et al. - 2015 - Learning Structured Output Representation using Deep Conditional Generative Models.pdf}
-}
-
-@misc{songDenoisingDiffusionImplicit2022,
-  title = {Denoising {{Diffusion Implicit Models}}},
-  author = {Song, Jiaming and Meng, Chenlin and Ermon, Stefano},
-  year = {2022},
-  month = oct,
-  number = {arXiv:2010.02502},
-  eprint = {2010.02502},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2010.02502},
-  urldate = {2025-09-06},
-  abstract = {Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples \$10 {\textbackslash}times\$ to \$50 {\textbackslash}times\$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/INI2LBQV/Song et al. - 2022 - Denoising Diffusion Implicit Models.pdf;/Users/fracapuano/Zotero/storage/GE2U4XU7/2010.html}
-}
-
-@article{SpinningUp2018,
-  title = {Spinning up in Deep Reinforcement Learning},
-  author = {Achiam, Joshua},
-  year = {2018}
-}
-
-@misc{SuttonBartoBook,
-  title = {Sutton \& {{Barto Book}}: {{Reinforcement Learning}}: {{An Introduction}}},
-  urldate = {2025-08-28},
-  howpublished = {http://incompleteideas.net/book/the-book-2nd.html},
-  file = {/Users/fracapuano/Zotero/storage/A3QZFGPB/the-book-2nd.html}
-}
-
-@inproceedings{suttonPolicyGradientMethods1999,
-  title = {Policy {{Gradient Methods}} for {{Reinforcement Learning}} with {{Function Approximation}}},
-  booktitle = {Advances in {{Neural Information Processing Systems}}},
-  author = {Sutton, Richard S and McAllester, David and Singh, Satinder and Mansour, Yishay},
-  year = {1999},
-  volume = {12},
-  publisher = {MIT Press},
-  urldate = {2025-08-31},
-  abstract = {Function  approximation  is  essential  to  reinforcement  learning,  but  the standard approach of approximating a  value function and deter(cid:173) mining  a  policy  from  it  has so  far  proven theoretically  intractable.  In this paper we explore an alternative approach in which the policy  is explicitly represented by its own function approximator,  indepen(cid:173) dent of the value function,  and is  updated according to the gradient  of expected reward with respect to the policy parameters.  Williams's  REINFORCE method and actor-critic methods are examples of this  approach.  Our  main  new  result  is  to  show  that  the  gradient  can  be  written  in  a  form  suitable  for  estimation  from  experience  aided  by  an  approximate  action-value  or  advantage  function.  Using  this  result,  we  prove for  the first  time that a  version  of policy  iteration  with arbitrary differentiable function approximation is convergent to  a  locally optimal policy.},
-  file = {/Users/fracapuano/Zotero/storage/4EKJMS5H/Sutton et al. - 1999 - Policy Gradient Methods for Reinforcement Learning with Function Approximation.pdf}
-}
-
-@inproceedings{suttonPolicyGradientMethods1999a,
-  title = {Policy {{Gradient Methods}} for {{Reinforcement Learning}} with {{Function Approximation}}},
-  booktitle = {Advances in {{Neural Information Processing Systems}}},
-  author = {Sutton, Richard S and McAllester, David and Singh, Satinder and Mansour, Yishay},
-  year = {1999},
-  volume = {12},
-  publisher = {MIT Press},
-  urldate = {2025-08-31},
-  abstract = {Function  approximation  is  essential  to  reinforcement  learning,  but  the standard approach of approximating a  value function and deter(cid:173) mining  a  policy  from  it  has so  far  proven theoretically  intractable.  In this paper we explore an alternative approach in which the policy  is explicitly represented by its own function approximator,  indepen(cid:173) dent of the value function,  and is  updated according to the gradient  of expected reward with respect to the policy parameters.  Williams's  REINFORCE method and actor-critic methods are examples of this  approach.  Our  main  new  result  is  to  show  that  the  gradient  can  be  written  in  a  form  suitable  for  estimation  from  experience  aided  by  an  approximate  action-value  or  advantage  function.  Using  this  result,  we  prove for  the first  time that a  version  of policy  iteration  with arbitrary differentiable function approximation is convergent to  a  locally optimal policy.},
-  file = {/Users/fracapuano/Zotero/storage/JNPS7AMN/Sutton et al. - 1999 - Policy Gradient Methods for Reinforcement Learning with Function Approximation.pdf}
-}
-
-@book{suttonReinforcementLearningIntroduction2018,
-  title = {Reinforcement Learning: An Introduction},
-  shorttitle = {Reinforcement Learning},
-  author = {Sutton, Richard S. and Barto, Andrew G.},
-  year = {2018},
-  series = {Adaptive Computation and Machine Learning Series},
-  edition = {Second edition},
-  publisher = {The MIT Press},
-  address = {Cambridge, Massachusetts},
-  abstract = {"Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms."--},
-  isbn = {978-0-262-03924-6},
-  langid = {english},
-  lccn = {Q325.6 .R45 2018},
-  keywords = {Reinforcement learning},
-  file = {/Users/fracapuano/Zotero/storage/CJB8FNNL/Sutton and Barto - 2018 - Reinforcement learning an introduction.pdf}
-}
-
-@misc{tancikFourierFeaturesLet2020,
-  title = {Fourier {{Features Let Networks Learn High Frequency Functions}} in {{Low Dimensional Domains}}},
-  author = {Tancik, Matthew and Srinivasan, Pratul P. and Mildenhall, Ben and {Fridovich-Keil}, Sara and Raghavan, Nithin and Singhal, Utkarsh and Ramamoorthi, Ravi and Barron, Jonathan T. and Ng, Ren},
-  year = {2020},
-  month = jun,
-  number = {arXiv:2006.10739},
-  eprint = {2006.10739},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2006.10739},
-  urldate = {2025-09-06},
-  abstract = {We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/AYWWN7ME/Tancik et al. - 2020 - Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains.pdf;/Users/fracapuano/Zotero/storage/68Q4Y4LM/2006.html}
-}
-
-@misc{tangDeepReinforcementLearning2024,
-  title = {Deep {{Reinforcement Learning}} for {{Robotics}}: {{A Survey}} of {{Real-World Successes}}},
-  shorttitle = {Deep {{Reinforcement Learning}} for {{Robotics}}},
-  author = {Tang, Chen and Abbatematteo, Ben and Hu, Jiaheng and Chandra, Rohan and {Mart{\'i}n-Mart{\'i}n}, Roberto and Stone, Peter},
-  year = {2024},
-  month = sep,
-  number = {arXiv:2408.03539},
-  eprint = {2408.03539},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2408.03539},
-  urldate = {2025-08-29},
-  abstract = {Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. This article provides a modern survey of DRL for robotics, with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies. Our analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. We highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms, holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks, and principled development and evaluation procedures. This survey is designed to offer insights for both RL practitioners and roboticists toward harnessing RL's power to create generally capable real-world robotic systems.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/ZTX4VSMA/Tang et al. - 2024 - Deep Reinforcement Learning for Robotics A Survey of Real-World Successes.pdf;/Users/fracapuano/Zotero/storage/WDVGKFL3/2408.html}
-}
-
-@article{tangDeepReinforcementLearning2025,
-  title = {Deep {{Reinforcement Learning}} for {{Robotics}}: {{A Survey}} of {{Real-World Successes}}},
-  shorttitle = {Deep {{Reinforcement Learning}} for {{Robotics}}},
-  author = {Tang, Chen and Abbatematteo, Ben and Hu, Jiaheng and Chandra, Rohan and {Mart{\'i}n-Mart{\'i}n}, Roberto and Stone, Peter},
-  year = {2025},
-  month = may,
-  journal = {Annual Review of Control, Robotics, and Autonomous Systems},
-  volume = {8},
-  number = {Volume 8, 2025},
-  pages = {153--188},
-  publisher = {Annual Reviews},
-  issn = {2573-5144},
-  doi = {10.1146/annurev-control-030323-022510},
-  urldate = {2025-08-29},
-  abstract = {Reinforcement learning (RL), particularly its combination with deep neural networks, referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. This article provides a modern survey of DRL for robotics, with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies. Our analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. We highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms; holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks; and principled development and evaluation procedures. This survey is designed to offer insights for both RL practitioners and roboticists toward harnessing RL\&apos;s power to create generally capable real-world robotic systems.},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/CCNUWJ73/Tang et al. - 2025 - Deep Reinforcement Learning for Robotics A Survey of Real-World Successes.pdf;/Users/fracapuano/Zotero/storage/UVIIIEXP/Tang et al. - 2025 - Deep Reinforcement Learning for Robotics A Survey of Real-World Successes.pdf;/Users/fracapuano/Zotero/storage/EUKPASJ2/annurev-control-030323-022510.html}
-}
-
-@article{tangPerceptionNavigationAutonomous2023,
-  title = {Perception and {{Navigation}} in {{Autonomous Systems}} in the {{Era}} of {{Learning}}: {{A Survey}}},
-  shorttitle = {Perception and {{Navigation}} in {{Autonomous Systems}} in the {{Era}} of {{Learning}}},
-  author = {Tang, Yang and Zhao, Chaoqiang and Wang, Jianrui and Zhang, Chongzhen and Sun, Qiyu and Zheng, Weixing and Du, Wenli and Qian, Feng and Kurths, Juergen},
-  year = {2023},
-  month = dec,
-  journal = {IEEE Transactions on Neural Networks and Learning Systems},
-  volume = {34},
-  number = {12},
-  eprint = {2001.02319},
-  primaryclass = {cs},
-  pages = {9604--9624},
-  issn = {2162-237X, 2162-2388},
-  doi = {10.1109/TNNLS.2022.3167688},
-  urldate = {2025-08-27},
-  abstract = {Autonomous systems possess the features of inferring their own state, understanding their surroundings, and performing autonomous navigation. With the applications of learning systems, like deep learning and reinforcement learning, the visual-based self-state estimation, environment perception and navigation capabilities of autonomous systems have been efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous visual perception and navigation. In this review, we focus on the applications of learning-based monocular approaches in ego-motion perception, environment perception and navigation in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions, which demonstrate the necessity to integrate deep learning techniques. Second, we review the visual-based environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, monocular ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional vSLAM frameworks. Then, we focus on the visual navigation based on learning systems, mainly including reinforcement learning and deep reinforcement learning. Finally, we examine several challenges and promising directions discussed and concluded in related research of learning systems in the era of computer science and robotics.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/D3YRY6XE/Tang et al. - 2023 - Perception and Navigation in Autonomous Systems in the Era of Learning A Survey.pdf;/Users/fracapuano/Zotero/storage/SAYN9GG9/2001.html}
-}
-
-@misc{teamGemma2Improving2024,
-  title = {Gemma 2: {{Improving Open Language Models}} at a {{Practical Size}}},
-  shorttitle = {Gemma 2},
-  author = {Team, Gemma and Riviere, Morgane and Pathak, Shreya and Sessa, Pier Giuseppe and Hardin, Cassidy and Bhupatiraju, Surya and Hussenot, L{\'e}onard and Mesnard, Thomas and Shahriari, Bobak and Ram{\'e}, Alexandre and Ferret, Johan and Liu, Peter and Tafti, Pouya and Friesen, Abe and Casbon, Michelle and Ramos, Sabela and Kumar, Ravin and Lan, Charline Le and Jerome, Sammy and Tsitsulin, Anton and Vieillard, Nino and Stanczyk, Piotr and Girgin, Sertan and Momchev, Nikola and Hoffman, Matt and Thakoor, Shantanu and Grill, Jean-Bastien and Neyshabur, Behnam and Bachem, Olivier and Walton, Alanna and Severyn, Aliaksei and Parrish, Alicia and Ahmad, Aliya and Hutchison, Allen and Abdagic, Alvin and Carl, Amanda and Shen, Amy and Brock, Andy and Coenen, Andy and Laforge, Anthony and Paterson, Antonia and Bastian, Ben and Piot, Bilal and Wu, Bo and Royal, Brandon and Chen, Charlie and Kumar, Chintu and Perry, Chris and Welty, Chris and {Choquette-Choo}, Christopher A. and Sinopalnikov, Danila and Weinberger, David and Vijaykumar, Dimple and Rogozi{\'n}ska, Dominika and Herbison, Dustin and Bandy, Elisa and Wang, Emma and Noland, Eric and Moreira, Erica and Senter, Evan and Eltyshev, Evgenii and Visin, Francesco and Rasskin, Gabriel and Wei, Gary and Cameron, Glenn and Martins, Gus and Hashemi, Hadi and {Klimczak-Pluci{\'n}ska}, Hanna and Batra, Harleen and Dhand, Harsh and Nardini, Ivan and Mein, Jacinda and Zhou, Jack and Svensson, James and Stanway, Jeff and Chan, Jetha and Zhou, Jin Peng and Carrasqueira, Joana and Iljazi, Joana and Becker, Jocelyn and Fernandez, Joe and van Amersfoort, Joost and Gordon, Josh and Lipschultz, Josh and Newlan, Josh and Ji, Ju-yeong and Mohamed, Kareem and Badola, Kartikeya and Black, Kat and Millican, Katie and McDonell, Keelin and Nguyen, Kelvin and Sodhia, Kiranbir and Greene, Kish and Sjoesund, Lars Lowe and Usui, Lauren and Sifre, Laurent and Heuermann, Lena and Lago, Leticia and McNealus, Lilly and Soares, Livio Baldini and Kilpatrick, Logan and Dixon, Lucas and Martins, Luciano and Reid, Machel and Singh, Manvinder and Iverson, Mark and G{\"o}rner, Martin and Velloso, Mat and Wirth, Mateo and Davidow, Matt and Miller, Matt and Rahtz, Matthew and Watson, Matthew and Risdal, Meg and Kazemi, Mehran and Moynihan, Michael and Zhang, Ming and Kahng, Minsuk and Park, Minwoo and Rahman, Mofi and Khatwani, Mohit and Dao, Natalie and Bardoliwalla, Nenshad and Devanathan, Nesh and Dumai, Neta and Chauhan, Nilay and Wahltinez, Oscar and Botarda, Pankil and Barnes, Parker and Barham, Paul and Michel, Paul and Jin, Pengchong and Georgiev, Petko and Culliton, Phil and Kuppala, Pradeep and Comanescu, Ramona and Merhej, Ramona and Jana, Reena and Rokni, Reza Ardeshir and Agarwal, Rishabh and Mullins, Ryan and Saadat, Samaneh and Carthy, Sara Mc and Perrin, Sarah and Arnold, S{\'e}bastien M. R. and Krause, Sebastian and Dai, Shengyang and Garg, Shruti and Sheth, Shruti and Ronstrom, Sue and Chan, Susan and Jordan, Timothy and Yu, Ting and Eccles, Tom and Hennigan, Tom and Kocisky, Tomas and Doshi, Tulsee and Jain, Vihan and Yadav, Vikas and Meshram, Vilobh and Dharmadhikari, Vishal and Barkley, Warren and Wei, Wei and Ye, Wenming and Han, Woohyun and Kwon, Woosuk and Xu, Xiang and Shen, Zhe and Gong, Zhitao and Wei, Zichuan and Cotruta, Victor and Kirk, Phoebe and Rao, Anand and Giang, Minh and Peran, Ludovic and Warkentin, Tris and Collins, Eli and Barral, Joelle and Ghahramani, Zoubin and Hadsell, Raia and Sculley, D. and Banks, Jeanine and Dragan, Anca and Petrov, Slav and Vinyals, Oriol and Dean, Jeff and Hassabis, Demis and Kavukcuoglu, Koray and Farabet, Clement and Buchatskaya, Elena and Borgeaud, Sebastian and Fiedel, Noah and Joulin, Armand and Kenealy, Kathleen and Dadashi, Robert and Andreev, Alek},
-  year = {2024},
-  month = aug,
-  number = {arXiv:2408.00118},
-  eprint = {2408.00118},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2408.00118},
-  urldate = {2025-09-08},
-  abstract = {In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language},
-  file = {/Users/fracapuano/Zotero/storage/NTLZNFPL/Team et al. - 2024 - Gemma 2 Improving Open Language Models at a Practical Size.pdf;/Users/fracapuano/Zotero/storage/GKX7JFK3/2408.html}
-}
-
-@misc{tedrakeRoboticManipulationPerception,
-  title = {Robotic {{Manipulation}}. {{Perception}}, {{Planning}} and {{Control}}.},
-  author = {Tedrake, Russ}
-}
-
-@misc{tedrakeUnderactuatedRoboticsAlgorithms,
-  title = {Underactuated {{Robotics}}. {{Algorithms}} for {{Walking}}, {{Running}}, {{Swimming}}, {{Flying}}, and {{Manipulation}}},
-  author = {Tedrake, Russ}
-}
-
-@article{thrunPROBABILISTICROBOTICS,
-  title = {{{PROBABILISTIC ROBOTICS}}},
-  author = {Thrun, Sebastian and Burgard, Wolfram and Fox, Dieter},
-  langid = {english},
-  file = {/Users/fracapuano/Zotero/storage/UKNC34V7/Thrun et al. - PROBABILISTIC ROBOTICS.pdf}
-}
-
-@misc{tiboniDomainRandomizationEntropy2024,
-  title = {Domain {{Randomization}} via {{Entropy Maximization}}},
-  author = {Tiboni, Gabriele and Klink, Pascal and Peters, Jan and Tommasi, Tatiana and D'Eramo, Carlo and Chalvatzaki, Georgia},
-  year = {2024},
-  month = mar,
-  number = {arXiv:2311.01885},
-  eprint = {2311.01885},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2311.01885},
-  urldate = {2025-08-30},
-  abstract = {Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/T5KH6GM9/Tiboni et al. - 2024 - Domain Randomization via Entropy Maximization.pdf;/Users/fracapuano/Zotero/storage/KRE436NC/2311.html}
-}
-
-@misc{tiboniDROPOSimtoRealTransfer2023,
-  title = {{{DROPO}}: {{Sim-to-Real Transfer}} with {{Offline Domain Randomization}}},
-  shorttitle = {{{DROPO}}},
-  author = {Tiboni, Gabriele and Arndt, Karol and Kyrki, Ville},
-  year = {2023},
-  month = jan,
-  number = {arXiv:2201.08434},
-  eprint = {2201.08434},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2201.08434},
-  urldate = {2025-08-31},
-  abstract = {In recent years, domain randomization over dynamics parameters has gained a lot of traction as a method for sim-to-real transfer of reinforcement learning policies in robotic manipulation; however, finding optimal randomization distributions can be difficult. In this paper, we introduce DROPO, a novel method for estimating domain randomization distributions for safe sim-to-real transfer. Unlike prior work, DROPO only requires a limited, precollected offline dataset of trajectories, and explicitly models parameter uncertainty to match real data using a likelihood-based approach. We demonstrate that DROPO is capable of recovering dynamic parameter distributions in simulation and finding a distribution capable of compensating for an unmodeled phenomenon. We also evaluate the method in two zero-shot sim-to-real transfer scenarios, showing successful domain transfer and improved performance over prior methods.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/Q875LPZF/Tiboni et al. - 2023 - DROPO Sim-to-Real Transfer with Offline Domain Randomization.pdf;/Users/fracapuano/Zotero/storage/2NQ4L37P/2201.html}
-}
-
-@misc{tobinDomainRandomizationTransferring2017,
-  title = {Domain {{Randomization}} for {{Transferring Deep Neural Networks}} from {{Simulation}} to the {{Real World}}},
-  author = {Tobin, Josh and Fong, Rachel and Ray, Alex and Schneider, Jonas and Zaremba, Wojciech and Abbeel, Pieter},
-  year = {2017},
-  month = mar,
-  number = {arXiv:1703.06907},
-  eprint = {1703.06907},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.1703.06907},
-  urldate = {2025-08-30},
-  abstract = {Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to \$1.5\$cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/TYJZAD9R/Tobin et al. - 2017 - Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.pdf;/Users/fracapuano/Zotero/storage/C9QS7DES/1703.html}
-}
-
-@article{tong2024cambrian,
-  title = {Cambrian-1: {{A}} Fully Open, Vision-Centric Exploration of Multimodal Llms},
-  author = {Tong, Peter and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and IYER, Adithya Jairam Vedagiri and Akula, Sai Charitha and Yang, Shusheng and Yang, Jihan and Middepogu, Manoj and Wang, Ziteng and others},
-  year = {2024},
-  journal = {Advances in Neural Information Processing Systems},
-  volume = {37},
-  pages = {87310--87356}
-}
-
-@misc{touvronLlama2Open2023,
-  title = {Llama 2: {{Open Foundation}} and {{Fine-Tuned Chat Models}}},
-  shorttitle = {Llama 2},
-  author = {Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and Bikel, Dan and Blecher, Lukas and Ferrer, Cristian Canton and Chen, Moya and Cucurull, Guillem and Esiobu, David and Fernandes, Jude and Fu, Jeremy and Fu, Wenyin and Fuller, Brian and Gao, Cynthia and Goswami, Vedanuj and Goyal, Naman and Hartshorn, Anthony and Hosseini, Saghar and Hou, Rui and Inan, Hakan and Kardas, Marcin and Kerkez, Viktor and Khabsa, Madian and Kloumann, Isabel and Korenev, Artem and Koura, Punit Singh and Lachaux, Marie-Anne and Lavril, Thibaut and Lee, Jenya and Liskovich, Diana and Lu, Yinghai and Mao, Yuning and Martinet, Xavier and Mihaylov, Todor and Mishra, Pushkar and Molybog, Igor and Nie, Yixin and Poulton, Andrew and Reizenstein, Jeremy and Rungta, Rashi and Saladi, Kalyan and Schelten, Alan and Silva, Ruan and Smith, Eric Michael and Subramanian, Ranjan and Tan, Xiaoqing Ellen and Tang, Binh and Taylor, Ross and Williams, Adina and Kuan, Jian Xiang and Xu, Puxin and Yan, Zheng and Zarov, Iliyan and Zhang, Yuchen and Fan, Angela and Kambadur, Melanie and Narang, Sharan and Rodriguez, Aurelien and Stojnic, Robert and Edunov, Sergey and Scialom, Thomas},
-  year = {2023},
-  month = jul,
-  number = {arXiv:2307.09288},
-  eprint = {2307.09288},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2307.09288},
-  urldate = {2025-09-08},
-  abstract = {In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language},
-  file = {/Users/fracapuano/Zotero/storage/VKQFSEUF/Touvron et al. - 2023 - Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf;/Users/fracapuano/Zotero/storage/N6MFUQCF/2307.html}
-}
-
-@article{tsimpoukelli2021multimodalfrozen,
-  title = {Multimodal Few-Shot Learning with Frozen Language Models},
-  author = {Tsimpoukelli, Maria and Menick, Jacob L and Cabi, Serkan and Eslami, {\relax SM} and Vinyals, Oriol and Hill, Felix},
-  year = {2021},
-  journal = {Advances in Neural Information Processing Systems},
-  volume = {34},
-  pages = {200--212}
-}
-
-@article{vallaeys2024improveddepalm,
-  title = {Improved Baselines for Data-Efficient Perceptual Augmentation of Llms},
-  author = {Vallaeys, Th{\'e}ophane and Shukor, Mustafa and Cord, Matthieu and Verbeek, Jakob},
-  year = {2024},
-  journal = {arXiv preprint arXiv:2403.13499},
-  eprint = {2403.13499},
-  archiveprefix = {arXiv}
-}
-
-@article{wang2025internvideo2,
-  title = {{{InternVideo2}}. 5: {{Empowering}} Video Mllms with Long and Rich Context Modeling},
-  author = {Wang, Yi and Li, Xinhao and Yan, Ziang and He, Yinan and Yu, Jiashuo and Zeng, Xiangyu and Wang, Chenting and Ma, Changlian and Huang, Haian and Gao, Jianfei and others},
-  year = {2025},
-  journal = {arXiv preprint arXiv:2501.12386},
-  eprint = {2501.12386},
-  archiveprefix = {arXiv}
-}
-
-@misc{zhaiSigmoidLossLanguage2023,
-  title = {Sigmoid {{Loss}} for {{Language Image Pre-Training}}},
-  author = {Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas},
-  year = {2023},
-  month = sep,
-  number = {arXiv:2303.15343},
-  eprint = {2303.15343},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2303.15343},
-  urldate = {2025-09-09},
-  abstract = {We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. The sigmoid loss simultaneously allows further scaling up the batch size, while also performing better at smaller batch sizes. Combined with Locked-image Tuning, with only four TPUv4 chips, we train a SigLiT model that achieves 84.5\% ImageNet zero-shot accuracy in two days. The disentanglement of the batch size from the loss further allows us to study the impact of examples vs pairs and negative to positive ratio. Finally, we push the batch size to the extreme, up to one million, and find that the benefits of growing batch size quickly diminish, with a more reasonable batch size of 32k being sufficient. We release our models at https://github.com/google-research/big\_vision and hope our research motivates further explorations in improving the quality and efficiency of language-image pre-training.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computer Vision and Pattern Recognition},
-  file = {/Users/fracapuano/Zotero/storage/Z39H5W8R/Zhai et al. - 2023 - Sigmoid Loss for Language Image Pre-Training.pdf;/Users/fracapuano/Zotero/storage/IYX9QALK/2303.html}
-}
-
-@article{zhang2025videollama,
-  title = {{{VideoLLaMA}} 3: {{Frontier}} Multimodal Foundation Models for Image and Video Understanding},
-  author = {Zhang, Boqiang and Li, Kehan and Cheng, Zesen and Hu, Zhiqiang and Yuan, Yuqian and Chen, Guanzheng and Leng, Sicong and Jiang, Yuming and Zhang, Hang and Li, Xin and others},
-  year = {2025},
-  journal = {arXiv preprint arXiv:2501.13106},
-  eprint = {2501.13106},
-  archiveprefix = {arXiv}
-}
-
-@misc{zhangWoCoCoLearningWholeBody2024,
-  title = {{{WoCoCo}}: {{Learning Whole-Body Humanoid Control}} with {{Sequential Contacts}}},
-  shorttitle = {{{WoCoCo}}},
-  author = {Zhang, Chong and Xiao, Wenli and He, Tairan and Shi, Guanya},
-  year = {2024},
-  month = nov,
-  number = {arXiv:2406.06005},
-  eprint = {2406.06005},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2406.06005},
-  urldate = {2025-08-26},
-  abstract = {Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Graphics,Computer Science - Robotics,Computer Science - Systems and Control,Electrical Engineering and Systems Science - Systems and Control},
-  file = {/Users/fracapuano/Zotero/storage/2SYII7A2/Zhang et al. - 2024 - WoCoCo Learning Whole-Body Humanoid Control with Sequential Contacts.pdf;/Users/fracapuano/Zotero/storage/C6ZJPZEV/2406.html}
-}
-
-@misc{zhaoLearningFineGrainedBimanual2023,
-  title = {Learning {{Fine-Grained Bimanual Manipulation}} with {{Low-Cost Hardware}}},
-  author = {Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
-  year = {2023},
-  month = apr,
-  number = {arXiv:2304.13705},
-  eprint = {2304.13705},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  doi = {10.48550/arXiv.2304.13705},
-  urldate = {2025-08-26},
-  abstract = {Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90\% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Machine Learning,Computer Science - Robotics},
-  file = {/Users/fracapuano/Zotero/storage/4P7GCF3I/Zhao et al. - 2023 - Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.pdf;/Users/fracapuano/Zotero/storage/3BC9S3Z2/2304.html}
-}
-
-@misc{zhongPracticalBlockwiseNeural2018,
-  title = {Practical {{Block-wise Neural Network Architecture Generation}}},
-  author = {Zhong, Zhao and Yan, Junjie and Wu, Wei and Shao, Jing and Liu, Cheng-Lin},
-  year = {2018},
-  month = may,
-  number = {arXiv:1708.05552},
-  eprint = {1708.05552},
-  primaryclass = {cs},
-  publisher = {arXiv},
-  urldate = {2023-05-05},
-  abstract = {Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained sequentially to choose component layers. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it performs competitive results in comparison to the hand-crafted state-of-the-art networks on image classification, additionally, the best network generated by BlockQNN achieves 3.54\% top-1 error rate on CIFAR-10 which beats all existing auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of the search space in designing networks which only spends 3 days with 32 GPUs, and (3) moreover, it has strong generalizability that the network built on CIFAR also performs well on a larger-scale ImageNet dataset.},
-  archiveprefix = {arXiv},
-  keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Machine Learning},
-  file = {/Users/fracapuano/Zotero/storage/7ZJWPCRW/Zhong et al. - 2018 - Practical Block-wise Neural Network Architecture G.pdf;/Users/fracapuano/Zotero/storage/ZI2R395F/Zhong et al. - 2018 - Practical Block-wise Neural Network Architecture G.html}
-}
-
-@inproceedings{zhu2024minigpt,
-  title = {{{MiniGPT-4}}: {{Enhancing}} Vision-Language Understanding with Advanced Large Language Models},
-  booktitle = {The Twelfth International Conference on Learning Representations},
-  author = {Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},
-  year = {2024}
-}
-
-@misc{zotero-item-169,
-  type = {Misc}
-}
diff --git a/app/scripts/latex-to-mdx/input/main.dvi b/app/scripts/latex-to-mdx/input/main.dvi
deleted file mode 100644
index b3715804068d8cde9ebbc5da4fd8d28de3851a50..0000000000000000000000000000000000000000
Binary files a/app/scripts/latex-to-mdx/input/main.dvi and /dev/null differ
diff --git a/app/scripts/latex-to-mdx/input/main.tex b/app/scripts/latex-to-mdx/input/main.tex
deleted file mode 100644
index e5bf82a8a6ef91117d93e8a1ccb06d4ab1e8e40a..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/main.tex
+++ /dev/null
@@ -1,247 +0,0 @@
-\documentclass[table]{hfstyle/hf}
-
-% Basic packages
-\usepackage[utf8]{inputenc}
-\usepackage[T1]{fontenc}
-\usepackage{graphicx}
-\usepackage{booktabs}
-\usepackage{url}
-\usepackage{lineno}
-\usepackage{enumitem}
-\usepackage{listings}
-
-% Math and symbols
-\usepackage{amsmath}
-\usepackage{amsfonts}
-\usepackage{amssymb}
-\usepackage{nicefrac}
-\usepackage{siunitx}
-
-% Tables and figures
-\usepackage{multirow}
-\usepackage{bigdelim}
-\usepackage{longtable}
-\usepackage{tabularray}
-\usepackage{wrapfig}
-\usepackage{caption}
-\usepackage{subcaption}
-\usepackage{makecell}
-\usepackage{adjustbox}
-
-% Color and boxes
-\usepackage[most]{tcolorbox}
-\usepackage{xcolor}
-
-% Text and formatting
-\usepackage{xspace}
-\usepackage{soul}
-\usepackage{csquotes}
-\usepackage{arydshln}
-
-% Bibliography and references
-\usepackage{natbib}
-
-% Special packages
-\usepackage{todonotes}
-\usepackage[absolute]{textpos}
-\usepackage{pifont}
-\usepackage{bold-extra}
-\usepackage{pgf-pie}
-\usepackage{epigraph}
-
-% Algorithms
-\usepackage{algorithm}
-\usepackage{algpseudocode}
-
-% Hyperref (load last)
-\usepackage{hyperref}
-\definecolor{linkcolor}{RGB}{0, 0, 128}
-\hypersetup{
-     colorlinks   = true,
-     citecolor    = linkcolor,
-     linkcolor    = linkcolor,
-     urlcolor     = linkcolor,
-}
-
-% Custom commands
-\newcommand{\cmark}{\ding{51}}%
-\newcommand{\xmark}{\ding{55}}%
-
-\setlist[itemize]{leftmargin=*,itemsep=0em,parsep=0.3em,topsep=0.3em}
-
-\DeclareUnicodeCharacter{2212}{\ensuremath{-}}
-
-\addtolength{\extrarowheight}{\belowrulesep}
-\aboverulesep=0pt
-\belowrulesep=0pt
-
-\definecolor{maroon}{HTML}{F26035}
-\definecolor{yellow}{HTML}{FDBC42}
-\definecolor{lavender}{HTML}{734f96}
-\definecolor{darkergrey}{HTML}{444444}
-\definecolor{midgrey}{HTML}{e6eded}
-
-\definecolor{neutralEight}{HTML}{343434}
-\definecolor{neutralFive}{HTML}{838383}
-\definecolor{neutralThree}{HTML}{bebebe}
-\definecolor{neutralOne}{HTML}{dedede}
-\definecolor{lightgrey}{HTML}{fafcfc}
-
-\usepackage{tikz}
-\newcommand{\cblock}[3]{
-  \hspace{-1.5mm}
-  \begin{tikzpicture}
-    [
-    node/.style={square, minimum size=10mm, thick, line width=0pt},
-    ]
-    \node[fill={rgb,255:red,#1;green,#2;blue,#3}] () [] {};
-  \end{tikzpicture}%
-}
-
-\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
-
-\definecolor{maroon}{HTML}{F26035}
-\definecolor{yellow}{HTML}{FDBC42}
-\definecolor{darkred}{RGB}{156, 39, 33}
-\definecolor{darkblue}{RGB}{31, 90, 153}
-\definecolor{forestgreen}{rgb}{0.13, 0.55, 0.13}
-\definecolor{olmoDarkBlue}{HTML}{012e59}
-\definecolor{olmoBlue}{HTML}{265ed4}
-\definecolor{olmoLightBlue}{HTML}{012e59}
-\definecolor{olmoTeal}{HTML}{00d5ff}
-\definecolor{olmoYellow}{HTML}{ffbb00}
-\definecolor{olmoOrange}{HTML}{ff9100}
-
-\newcommand{\nol}[1]{{\color{purple} [nol]: #1}}
-
-% Code snippets definitions
-\definecolor{codegreen}{rgb}{0,0.6,0}
-\definecolor{codegray}{rgb}{0.5,0.5,0.5}
-\definecolor{codepurple}{rgb}{0.58,0,0.82}
-\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
-
-\lstdefinestyle{mycodestyle}{
-    backgroundcolor=\color{backcolour},   
-    commentstyle=\color{codegreen},
-    keywordstyle=\color{magenta},
-    numberstyle=\tiny\color{codegray},
-    stringstyle=\color{codepurple},
-    basicstyle=\ttfamily\footnotesize,
-    breakatwhitespace=false,         
-    breaklines=true,                 
-    captionpos=b,                    
-    keepspaces=true,                 
-    numbers=left,                    
-    numbersep=5pt,                  
-    showspaces=false,                
-    showstringspaces=false,
-    showtabs=false,                  
-    tabsize=2
-}
- 
-\lstset{style=mycodestyle}
-
-
-\usepackage{setspace}
- 
-\usepackage{nicematrix}
-\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
-\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
-\newcolumntype{R}[1]{>{\raggedleft\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
-\newcolumntype{P}[1]{>{\centering\let\newline\\\arraybackslash\columncolor{ai2lightpink}}m{#1}}
-\addtolength{\extrarowheight}{\belowrulesep}
-\aboverulesep=0pt
-\belowrulesep=0pt
-
-\newcommand{\orr}[1]{\textcolor{red}{[OZ:#1]}}
-
-\tcbuselibrary{minted}
-\usemintedstyle{colorful}
-
-\renewcommand{\theFancyVerbLine}{\color{olmoBlue}\footnotesize\arabic{FancyVerbLine}}
-
-\setminted[python]{
-  linenos,
-  breaklines,
-  fontsize=\footnotesize,
-  xleftmargin=2em
-}
-\crefname{tcb@cnt@pbox}{code}{code}
-\Crefname{tcb@cnt@pbox}{Code}{Code}
-\crefname{assumption}{assumption}{assumption}
-\Crefname{assumption}{Assumption}{Assumptions}
-
-
-
-\newtcolorbox[auto counter]{pbox}[2][]{
-  colback=white,
-  title=\textbf{Code~\thetcbcounter: #2},
-  #1,fonttitle=\sffamily,
-  fontupper=\sffamily,
-  arc=10pt,
-  colframe=hf4,
-  coltitle=hf3,
-  colbacktitle=hf4,
-  toptitle=0.25cm,
-  bottomtitle=0.125cm
-}
-
-\input{preamble}
-\input{math_commands}
-\input{handles}
-
-\title{
-Robot Learning: A Tutorial
-}
-
-\newcommand{\huggingface}{\raisebox{-1.5pt}{\includegraphics[height=1.05em]{logos/hf.pdf}}\xspace}
-\newcommand{\coreContrib}{\raisebox{.33em}{\hspace{.05em}\includegraphics[height=.5em]{logos/core.png}}\xspace}
-
-\newcommand{\hf}{\raisebox{.28em}{\hspace{.05em}\includegraphics[height=.65em]{logos/hf.pdf}}\xspace}
-\newcommand{\ensps}{\raisebox{.3em}{\hspace{.05em}\includegraphics[height=.65em]{logos/ensps_logo.pdf}}\xspace}
-
-\authorOne[]{Francesco Capuano \ensps \hf}
-\authorOne[]{...}
-\authorOne[]{Adil Zouitine\hf}
-\authorOne[]{Pepijn Kooijmans\hf}
-\authorOne[]{Thomas Wolf\hf}
-\authorOne[]{Michel Aractingi\hf}
-
-\contribution[]{\ensps École Normale Supérieure Paris-Saclay, \hf Hugging Face}
-
-\newcommand{\fix}{\marginpar{FIX}}
-\newcommand{\new}{\marginpar{NEW}}
-
-\abstract{
-\input{sections/00_abstract}
-}
-
-\begin{document}
-
-
-\maketitle
-
-\tableofcontents
-\input{sections/A_foreword.tex}
-
-\newpage
-\input{sections/01_introduction}
-
-\input{sections/02_classic_robotics}
-
-\newpage
-\input{sections/03_reinforcement_learning.tex}
-
-\newpage
-\input{sections/04_imitation_learning.tex}
-
-\newpage
-\input{sections/05_foundation_models.tex}
-
-\newpage
-\input{sections/07_conclusions.tex}
-
-\bibliographystyle{hfstyle/plainnat}
-\bibliography{main}
-
-\end{document}
diff --git a/app/scripts/latex-to-mdx/input/manropebold.tfm b/app/scripts/latex-to-mdx/input/manropebold.tfm
deleted file mode 100644
index caed637bbfa723a84c0422d48d73ad97f4a28ee4..0000000000000000000000000000000000000000
Binary files a/app/scripts/latex-to-mdx/input/manropebold.tfm and /dev/null differ
diff --git a/app/scripts/latex-to-mdx/input/manroperegular.tfm b/app/scripts/latex-to-mdx/input/manroperegular.tfm
deleted file mode 100644
index ca50dabab3acec7c0f1d4d921e3c9734249abfe4..0000000000000000000000000000000000000000
Binary files a/app/scripts/latex-to-mdx/input/manroperegular.tfm and /dev/null differ
diff --git a/app/scripts/latex-to-mdx/input/math_commands.tex b/app/scripts/latex-to-mdx/input/math_commands.tex
deleted file mode 100644
index 3b1ddb5228ebfe03414d5b9aba8f016d44c8dc05..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/math_commands.tex
+++ /dev/null
@@ -1,574 +0,0 @@
-\newcommand*\diff{\mathrm{d}}
-\newcommand*\Image{\mathrm{Im}}
-\newcommand*\NN{\smash{\hat{\mathcal{F}}_{\scriptsize\textrm{NN}}}}
-
-\newcommand*\X{\mathcal{X}}
-\newcommand*\Z{\mathcal{Z}}
-\newcommand*\G{\mathcal{G}}
-\newcommand*\D{\mathcal{D}}
-\newcommand*\F{\mathcal{F}}
-\newcommand*\R{\mathcal{R}}
-\newcommand*\TR{\hat{R}}
-\newcommand*\Deltab{\bar{\Delta}}
-\newcommand*\h{h}
-\newcommand*\biasb{\mathrm{bias}}
-\newcommand*\varb{\mathrm{var}}
-\newcommand*\covb{\mathrm{cov}}
-\newcommand*\M{\mathcal{M}}
-\newcommand*\B{\mathcal{B}}
-\newcommand*\W{\mathcal{W}}
-\newcommand*\Loss{\mathcal{L}}
-\newcommand*{\Ftr}{\smash{\mathcal{F}_{\scriptsize\textrm{tr}}}}
-\newcommand*{\Fts}{\smash{\mathcal{F}_{\scriptsize\textrm{ad}}}}
-\newcommand*{\Dtr}{\smash{\mathcal{D}_{\scriptsize\textrm{tr}}}}
-\newcommand*{\Dts}{\smash{\mathcal{D}_{\scriptsize\textrm{ad}}}}
-\newcommand*{\Etr}{\smash{\mathcal{E}_{\scriptsize\textrm{tr}}}}
-\newcommand*{\Ead}{\smash{\mathcal{E}_{\scriptsize\textrm{ad}}}}
-
-\newcommand*{\eg}{e.g.,\@\xspace}
-\newcommand*{\versus}{vs.\@\xspace}
-\newcommand*{\sut}{s.t.\@\xspace}
-\newcommand*{\ie}{i.e.,\@\xspace}
-\newcommand*{\iid}{ID\@\xspace}
-\newcommand*{\sota}{SoTA\@\xspace}
-\newcommand*{\ood}{OOD\@\xspace}
-\newcommand*{\metric}{metric}
-\newcommand*{\wrt}{w.r.t.\@\xspace}
-\newcommand*{\iif}{i.i.f.\@\xspace}
-\newcommand*{\aka}{a.k.a.\@\xspace}
-\newcommand*{\rhs}{r.h.s.\@\xspace}
-\newcommand*{\etc}{etc.\@\xspace}
-\newcommand*{\cf}{cf.\@\xspace}
-\newcommand*{\resp}{resp.\@\xspace}
-
-\newcommand*\er{\mathrm{er}}
-\newcommand*\ess{\operatorname{ess}}
-
-\let\originalleft\left
-\let\originalright\right
-\renewcommand{\left}{\mathopen{}\mathclose\bgroup\originalleft}
-\renewcommand{\right}{\aftergroup\egroup\originalright}
-
-\let\up\textsuperscript
-\let\vec\boldsymbol
-
-
-\newcommand{\defeq}{\mathrel{:\mkern-0.25mu=}}
-\newcommand{\eqdef}{\mathrel{=\mkern-0.25mu:}}
-
-\newcommand{\figleft}{{\em (Left)}}
-\newcommand{\figcenter}{{\em (Center)}}
-\newcommand{\figright}{{\em (Right)}}
-\newcommand{\figtop}{{\em (Top)}}
-\newcommand{\figbottom}{{\em (Bottom)}}
-\newcommand{\captiona}{{\em (a)}}
-\newcommand{\captionb}{{\em (b)}}
-\newcommand{\captionc}{{\em (c)}}
-\newcommand{\captiond}{{\em (d)}}
-
-\newcommand{\newterm}[1]{{\bf #1}}
-\def\figref#1{figure~\ref{#1}}
-\def\Figref#1{Figure~\ref{#1}}
-\def\twofigref#1#2{figures \ref{#1} and \ref{#2}}
-\def\trifigref#1#2#3#4{figures \ref{#1}, \ref{#2}, and \ref{#3}}
-\def\quadfigref#1#2#3#4{figures \ref{#1}, \ref{#2}, \ref{#3} and \ref{#4}}
-\def\secref#1{section~\ref{#1}}
-\def\Secref#1{Section~\ref{#1}}
-\def\Termref#1{Term~\ref{#1}}
-\def\twosecref#1#2{sections \ref{#1} and \ref{#2}}
-\def\trisecref#1#2#3{sections \ref{#1}, \ref{#2} and \ref{#3}}
-\def\appref#1{appendix~\ref{#1}}
-\def\Appref#1{Appendix~\ref{#1}}
-\def\suppref#1{supp.~\ref{#1}}
-\def\Suppref#1{Supp.~\ref{#1}}
-\def\eqref#1{eq.~\ref{#1}}
-\def\Eqref#1{Eq.~\ref{#1}}
-\def\plaineqref#1{\ref{#1}}
-\def\chapref#1{chapter~\ref{#1}}
-\def\Chapref#1{Chapter~\ref{#1}}
-\def\rangechapref#1#2{chapters\ref{#1}--\ref{#2}}
-\def\algref#1{algorithm~\ref{#1}}
-\def\Algref#1{Algorithm~\ref{#1}}
-\def\twoalgref#1#2{algorithms \ref{#1} and \ref{#2}}
-\def\Twoalgref#1#2{Algorithms \ref{#1} and \ref{#2}}
-\def\partref#1{part~\ref{#1}}
-\def\Partref#1{Part~\ref{#1}}
-\def\twopartref#1#2{parts \ref{#1} and \ref{#2}}
-
-\def\Tabref#1{Table~\ref{#1}}
-\def\tabref#1{table~\ref{#1}}
-\def\twotabref#1#2{tables \ref{#1} and \ref{#2}}
-
-\def\ceil#1{\lceil #1 \rceil}
-\def\floor#1{\lfloor #1 \rfloor}
-
-\newcommand{\Lp}{\mathcal{L}^\text{prior}}
-\newcommand{\Ll}{\mathcal{L}^\text{likeli}}
-\newcommand{\Lal}{\Ls^{\text{l}\widehat{\text{ikel}}\text{i}}}
-
-\def\eps{{\varepsilon}}
-
-
-\def\xopt{{x^{*}}}
-\def\Gopt{{G^{*}}}
-
-\def\p{{\textnormal{p}}}
-\def\P{{\textnormal{p}}}
-\def\Q{{\textnormal{q}}}
-\def\q{{\textnormal{q}}}
-
-\def\gTh{{\hat \gT}}
-\def\gDh{{\hat \gD}}
-\def\gPh{{\hat \gP}}
-\newcommand{\tin}[1]{\mbox{\tiny $#1$}}
-
-
-\def\reta{{\textnormal{$\eta$}}}
-\def\ra{{\textnormal{a}}}
-\def\rb{{\textnormal{b}}}
-\def\rc{{\textnormal{c}}}
-\def\rd{{\textnormal{d}}}
-\def\re{{\textnormal{e}}}
-\def\rf{{\textnormal{f}}}
-\def\rg{{\textnormal{g}}}
-\def\rh{{\textnormal{h}}}
-\def\ri{{\textnormal{i}}}
-\def\rj{{\textnormal{j}}}
-\def\rk{{\textnormal{k}}}
-\def\rl{{\textnormal{l}}}
-\def\rn{{\textnormal{n}}}
-\def\ro{{\textnormal{o}}}
-\def\rp{{\textnormal{p}}}
-\def\rq{{\textnormal{q}}}
-\def\rr{{\textnormal{r}}}
-\def\rs{{\textnormal{s}}}
-\def\rt{{\textnormal{t}}}
-\def\ru{{\textnormal{u}}}
-\def\rv{{\textnormal{v}}}
-\def\rw{{\textnormal{w}}}
-\def\reps{{\mathcal{E}}}
-\def\rtheta{{\Theta}}
-\def\rx{{X}}
-\def\ry{{Y}}
-\def\rz{{Z}}
-
-
-\def\S{\mathcal{S}}
-\def\T{\mathcal{T}}
-\def\X{\mathcal{X}}
-\def\Y{\mathcal{Y}}
-\def\U{\mathcal{U}}
-
-\def\rvepsilon{{\mathbf{\epsilon}}}
-\def\rva{{\mathbf{a}}}
-\def\rvb{{\mathbf{b}}}
-\def\rvc{{\mathbf{c}}}
-\def\rvd{{\mathbf{d}}}
-\def\rve{{\mathbf{e}}}
-\def\rvf{{\mathbf{f}}}
-\def\rvg{{\mathbf{g}}}
-\def\rvh{{\mathbf{h}}}
-\def\rvu{{\mathbf{i}}}
-\def\rvj{{\mathbf{j}}}
-\def\rvk{{\mathbf{k}}}
-\def\rvl{{\mathbf{l}}}
-\def\rvm{{\mathbf{m}}}
-\def\rvn{{\mathbf{n}}}
-\def\rvo{{\mathbf{o}}}
-\def\rvp{{\mathbf{p}}}
-\def\rvq{{\mathbf{q}}}
-\def\rvr{{\mathbf{r}}}
-\def\rvs{{\mathbf{s}}}
-\def\rvt{{\mathbf{t}}}
-\def\rvu{{\mathbf{u}}}
-\def\rvv{{\mathbf{v}}}
-\def\rvw{{\mathbf{w}}}
-\def\rvx{{\mathbf{x}}}
-\def\rvy{{\mathbf{y}}}
-\def\rvz{{\mathbf{z}}}
-\def\rvtheta{{\bm{\theta}}}
-
-\def\erva{{\textnormal{a}}}
-\def\ervb{{\textnormal{b}}}
-\def\ervc{{\textnormal{c}}}
-\def\ervd{{\textnormal{d}}}
-\def\erve{{\textnormal{e}}}
-\def\ervf{{\textnormal{f}}}
-\def\ervg{{\textnormal{g}}}
-\def\ervh{{\textnormal{h}}}
-\def\ervi{{\textnormal{i}}}
-\def\ervj{{\textnormal{j}}}
-\def\ervk{{\textnormal{k}}}
-\def\ervl{{\textnormal{l}}}
-\def\ervm{{\textnormal{m}}}
-\def\ervn{{\textnormal{n}}}
-\def\ervo{{\textnormal{o}}}
-\def\ervp{{\textnormal{p}}}
-\def\ervq{{\textnormal{q}}}
-\def\ervr{{\textnormal{r}}}
-\def\ervs{{\textnormal{s}}}
-\def\ervt{{\textnormal{t}}}
-\def\ervu{{\textnormal{u}}}
-\def\ervv{{\textnormal{v}}}
-\def\ervw{{\textnormal{w}}}
-\def\ervx{{\textnormal{x}}}
-\def\ervy{{\textnormal{y}}}
-\def\ervz{{\textnormal{z}}}
-
-\def\rmA{{\mathbf{A}}}
-\def\rmB{{\mathbf{B}}}
-\def\rmC{{\mathbf{C}}}
-\def\rmD{{\mathbf{D}}}
-\def\rmE{{\mathbf{E}}}
-\def\rmF{{\mathbf{F}}}
-\def\rmG{{\mathbf{G}}}
-\def\rmH{{\mathbf{H}}}
-\def\rmI{{\mathbf{I}}}
-\def\rmJ{{\mathbf{J}}}
-\def\rmK{{\mathbf{K}}}
-\def\rmL{{\mathbf{L}}}
-\def\rmM{{\mathbf{M}}}
-\def\rmN{{\mathbf{N}}}
-\def\rmO{{\mathbf{O}}}
-\def\rmP{{\mathbf{P}}}
-\def\rmQ{{\mathbf{Q}}}
-\def\rmR{{\mathbf{R}}}
-\def\rmS{{\mathbf{S}}}
-\def\rmT{{\mathbf{T}}}
-\def\rmU{{\mathbf{U}}}
-\def\rmV{{\mathbf{V}}}
-\def\rmW{{\mathbf{W}}}
-\def\rmx{{\mathbf{x}}}
-\def\rmy{{\mathbf{y}}}
-\def\rmz{{\mathbf{Z}}}
-
-\def\ermA{{\textnormal{A}}}
-\def\ermB{{\textnormal{B}}}
-\def\ermC{{\textnormal{C}}}
-\def\ermD{{\textnormal{D}}}
-\def\ermE{{\textnormal{E}}}
-\def\ermF{{\textnormal{F}}}
-\def\ermG{{\textnormal{G}}}
-\def\ermH{{\textnormal{H}}}
-\def\ermI{{\textnormal{I}}}
-\def\ermJ{{\textnormal{J}}}
-\def\ermK{{\textnormal{K}}}
-\def\ermL{{\textnormal{L}}}
-\def\ermM{{\textnormal{M}}}
-\def\ermN{{\textnormal{N}}}
-\def\ermO{{\textnormal{O}}}
-\def\ermP{{\textnormal{P}}}
-\def\ermQ{{\textnormal{Q}}}
-\def\ermR{{\textnormal{R}}}
-\def\ermS{{\textnormal{S}}}
-\def\ermT{{\textnormal{T}}}
-\def\ermU{{\textnormal{U}}}
-\def\ermV{{\textnormal{V}}}
-\def\ermW{{\textnormal{W}}}
-\def\ermX{{\textnormal{X}}}
-\def\ermY{{\textnormal{Y}}}
-\def\ermZ{{\textnormal{Z}}}
-
-\def\vzero{{\bm{0}}}
-\def\vone{{\bm{1}}}
-\def\va{{\bm{a}}}
-\def\vb{{\bm{b}}}
-\def\vc{{\bm{c}}}
-\def\vd{{\bm{d}}}
-\def\ve{{\bm{e}}}
-\def\vf{{\bm{f}}}
-\def\vg{{\bm{g}}}
-\def\vh{{\bm{h}}}
-\def\vi{{\bm{i}}}
-\def\vj{{\bm{j}}}
-\def\vk{{\bm{k}}}
-\def\vl{{\bm{l}}}
-\def\vm{{\bm{m}}}
-\def\vn{{\bm{n}}}
-\def\vo{{\bm{o}}}
-\def\vp{{\bm{p}}}
-\def\vq{{\bm{q}}}
-\def\vr{{\bm{r}}}
-\def\vs{{\bm{s}}}
-\def\vt{{\bm{t}}}
-\def\vu{{\bm{u}}}
-\def\vv{{\bm{v}}}
-\def\vw{{\bm{w}}}
-\def\vx{{\bm{x}}}
-\def\vy{{\bm{y}}}
-\def\vz{{\bm{z}}}
-\def\valpha{{\bm{\alpha}}}
-\def\vtheta{{\bm{\theta}}}
-\def\vdelta{{\bm{\delta}}}
-\def\vDelta{{\bm{\Delta}}}
-\def\vmu{{\bm{\mu}}}
-\def\vphi{{\bm{\phi}}}
-\def\vSigma{{\bm{\Sigma}}}
-\def\evalpha{{\alpha}}
-\def\evbeta{{\beta}}
-\def\evepsilon{{\epsilon}}
-\def\evlambda{{\lambda}}
-\def\evomega{{\omega}}
-\def\evmu{{\mu}}
-\def\evpsi{{\psi}}
-\def\evsigma{{\sigma}}
-\def\evtheta{{\theta}}
-\def\eva{{a}}
-\def\evb{{b}}
-\def\evc{{c}}
-\def\evd{{d}}
-\def\eve{{e}}
-\def\evf{{f}}
-\def\evg{{g}}
-\def\evh{{h}}
-\def\evi{{i}}
-\def\evj{{j}}
-\def\evk{{k}}
-\def\evl{{l}}
-\def\evm{{m}}
-\def\evn{{n}}
-\def\evo{{o}}
-\def\evp{{p}}
-\def\evq{{q}}
-\def\evr{{r}}
-\def\evs{{s}}
-\def\evt{{t}}
-\def\evu{{u}}
-\def\evv{{v}}
-\def\evw{{w}}
-\def\evx{{x}}
-\def\evy{{y}}
-\def\evz{{z}}
-
-\def\mA{{\bm{A}}}
-\def\mB{{\bm{B}}}
-\def\mC{{\bm{C}}}
-\def\mD{{\bm{D}}}
-\def\mE{{\bm{E}}}
-\def\mF{{\bm{F}}}
-\def\mG{{\bm{G}}}
-\def\mH{{\bm{H}}}
-\def\mI{{\bm{I}}}
-\def\mJ{{\bm{J}}}
-\def\mK{{\bm{K}}}
-\def\mL{{\bm{L}}}
-\def\mM{{\bm{M}}}
-\def\mN{{\bm{N}}}
-\def\mO{{\bm{O}}}
-\def\mP{{\bm{P}}}
-\def\mQ{{\bm{Q}}}
-\def\mR{{\bm{R}}}
-\def\mS{{\bm{S}}}
-\def\mT{{\bm{T}}}
-\def\mU{{\bm{U}}}
-\def\mV{{\bm{V}}}
-\def\mW{{\bm{W}}}
-\def\mX{{\bm{X}}}
-\def\mY{{\bm{Y}}}
-\def\mZ{{\bm{Z}}}
-\def\E{{\mathcal{E}}}
-\def\mBeta{{\bm{\beta}}}
-\def\mTheta{{\bm{\theta}}}
-\def\mPhi{{\bm{\Phi}}}
-\def\mLambda{{\bm{\Lambda}}}
-\def\mSigma{{\bm{\Sigma}}}
-
-\DeclareMathAlphabet{\mathsfit}{\encodingdefault}{\sfdefault}{m}{sl}
-\SetMathAlphabet{\mathsfit}{bold}{\encodingdefault}{\sfdefault}{bx}{n}
-\newcommand{\tens}[1]{\bm{\mathsfit{#1}}}
-\def\tA{{\tens{A}}}
-\def\tB{{\tens{B}}}
-\def\tC{{\tens{C}}}
-\def\tD{{\tens{D}}}
-\def\tE{{\tens{E}}}
-\def\tF{{\tens{F}}}
-\def\tG{{\tens{G}}}
-\def\tH{{\tens{H}}}
-\def\tI{{\tens{I}}}
-\def\tJ{{\tens{J}}}
-\def\tK{{\tens{K}}}
-\def\tL{{\tens{L}}}
-\def\tM{{\tens{M}}}
-\def\tN{{\tens{N}}}
-\def\tO{{\tens{O}}}
-\def\tP{{\tens{P}}}
-\def\tQ{{\tens{Q}}}
-\def\tR{{\tens{R}}}
-\def\tS{{\tens{S}}}
-\def\tT{{\tens{T}}}
-\def\tU{{\tens{U}}}
-\def\tV{{\tens{V}}}
-\def\tW{{\tens{W}}}
-\def\tX{{\tens{X}}}
-\def\tY{{\tens{Y}}}
-\def\tZ{{\tens{Z}}}
-
-\def\tx{{\tens{x}}}
-
-
-\def\gA{{\mathcal{A}}}
-\def\gB{{\mathcal{B}}}
-\def\gC{{\mathcal{C}}}
-\def\gD{{\mathcal{D}}}
-\def\gE{{\mathcal{E}}}
-\def\gF{{\mathcal{F}}}
-\def\gG{{\mathcal{G}}}
-\def\gGh{{\hat\gG}}
-\def\gFh{{\hat\gF}}
-
-
-\def\gH{{\mathcal{H}}}
-\def\gI{{\mathcal{I}}}
-\def\gJ{{\mathcal{J}}}
-\def\gK{{\mathcal{K}}}
-\def\gL{{\mathcal{L}}}
-\def\gM{{\mathcal{M}}}
-\def\gN{{\mathcal{N}}}
-\def\gO{{\mathcal{O}}}
-\def\gP{{\mathcal{P}}}
-\def\gQ{{\mathcal{Q}}}
-\def\gR{{\mathcal{R}}}
-\def\gS{{\mathcal{S}}}
-\def\gT{{\mathcal{T}}}
-\def\gU{{\mathcal{U}}}
-\def\gV{{\mathcal{V}}}
-\def\gW{{\mathcal{W}}}
-\def\gX{{\mathcal{X}}}
-\def\gY{{\mathcal{Y}}}
-\def\gZ{{\mathcal{Z}}}
-
-\def\sA{{\mathbb{A}}}
-\def\sB{{\mathbb{B}}}
-\def\sC{{\mathbb{C}}}
-\def\sD{{\mathbb{D}}}
-\def\sF{{\mathbb{F}}}
-\def\sG{{\mathbb{G}}}
-\def\sH{{\mathbb{H}}}
-\def\sI{{\mathbb{I}}}
-\def\sJ{{\mathbb{J}}}
-\def\sK{{\mathbb{K}}}
-\def\sL{{\mathbb{L}}}
-\def\sM{{\mathbb{M}}}
-\def\sN{{\mathbb{N}}}
-\def\sO{{\mathbb{O}}}
-\def\sP{{\mathbb{P}}}
-\def\sQ{{\mathbb{Q}}}
-\def\sR{{\mathbb{R}}}
-\def\sS{{\mathbb{S}}}
-\def\sU{{\mathbb{U}}}
-\def\sV{{\mathbb{V}}}
-\def\sW{{\mathbb{W}}}
-\def\sX{{\mathcal{X}}}
-\def\sY{{\mathcal{Y}}}
-\def\sZ{{\mathcal{Z}}}
-\def\sTheta{{\bm{\Theta}}}
-
-\def\emLambda{{\Lambda}}
-\def\emA{{A}}
-\def\emB{{B}}
-\def\emC{{C}}
-\def\emD{{D}}
-\def\emE{{E}}
-\def\emF{{F}}
-\def\emG{{G}}
-\def\emH{{H}}
-\def\emI{{I}}
-\def\emJ{{J}}
-\def\emK{{K}}
-\def\emL{{L}}
-\def\emM{{M}}
-\def\emN{{N}}
-\def\emO{{O}}
-\def\emP{{P}}
-\def\emQ{{Q}}
-\def\emR{{R}}
-\def\emS{{S}}
-\def\emT{{T}}
-\def\emU{{U}}
-\def\emV{{V}}
-\def\emW{{W}}
-\def\emX{{X}}
-\def\emY{{Y}}
-\def\emZ{{Z}}
-\def\emSigma{{\Sigma}}
-
-\newcommand{\etens}[1]{\mathsfit{#1}}
-\def\etLambda{{\etens{\Lambda}}}
-\def\etA{{\etens{A}}}
-\def\etB{{\etens{B}}}
-\def\etC{{\etens{C}}}
-\def\etD{{\etens{D}}}
-\def\etE{{\etens{E}}}
-\def\etF{{\etens{F}}}
-\def\etG{{\etens{G}}}
-\def\etH{{\etens{H}}}
-\def\etI{{\etens{I}}}
-\def\etJ{{\etens{J}}}
-\def\etK{{\etens{K}}}
-\def\etL{{\etens{L}}}
-\def\etM{{\etens{M}}}
-\def\etN{{\etens{N}}}
-\def\etO{{\etens{O}}}
-\def\etP{{\etens{P}}}
-\def\etQ{{\etens{Q}}}
-\def\etR{{\etens{R}}}
-\def\etS{{\etens{S}}}
-\def\etT{{\etens{T}}}
-\def\etU{{\etens{U}}}
-\def\etV{{\etens{V}}}
-\def\etW{{\etens{W}}}
-\def\etX{{\etens{X}}}
-\def\etY{{\etens{Y}}}
-\def\etZ{{\etens{Z}}}
-
-\newcommand{\pdata}{p_{\rm{data}}}
-\newcommand{\ptrain}{\hat{p}_{\rm{data}}}
-\newcommand{\Ptrain}{\hat{P}_{\rm{data}}}
-\newcommand{\pmodel}{p_{\rm{model}}}
-\newcommand{\Pmodel}{P_{\rm{model}}}
-\newcommand{\ptildemodel}{\tilde{p}_{\rm{model}}}
-\newcommand{\pencode}{p_{\rm{encoder}}}
-\newcommand{\pdecode}{p_{\rm{decoder}}}
-\newcommand{\precons}{p_{\rm{reconstruct}}}
-\newcommand{\dd}{\mathrm{d}}
-
-\newcommand{\laplace}{\mathrm{Laplace}} %
-
-\newcommand{\KL}{$\mathrm{KL}$\@\xspace}
-\newcommand{\Kl}{\mathrm{KL}}
-\newcommand{\Esp}{\mathbb{E}}
-\newcommand{\Ls}{\mathcal{L}}
-\newcommand{\emp}{\tilde{p}}
-\newcommand{\lr}{\alpha}
-\newcommand{\reg}{\lambda}
-\newcommand{\rect}{\mathrm{rectifier}}
-\newcommand{\softmax}{\mathrm{softmax}}
-\newcommand{\slerp}{\mathrm{slerp}}
-\newcommand{\sigmoid}{\sigma}
-\newcommand{\softplus}{\zeta}
-\newcommand{\Var}{\mathrm{Var}}
-\newcommand{\standarderror}{\mathrm{SE}}
-\newcommand{\Cov}{\mathrm{Cov}}
-\newcommand{\Span}{\mathrm{Span}}
-\newcommand{\card}{\mathrm{card}}
-
-
-\newcommand{\KLD}[2]{D_{\mathrm{KL}} \left( \left. \left. #1 \right|\right| #2 \right) }
-\newcommand{\normlzero}{L^0}
-\newcommand{\normlone}{L^1}
-\newcommand{\normltwo}{L^2}
-\newcommand{\normlp}{L^p}
-\newcommand{\normmax}{L^\infty}
-
-\newcommand{\pihalf}{\frac{\pi}{2}}
-
-
-\newcommand{\parents}{Pa} %
-
-\DeclareMathOperator*{\argmax}{argmax}
-\DeclareMathOperator*{\argmin}{argmin}
-\newcommand{\acc}{\mathrm{Acc}}
-\newcommand{\1}{\mathds{1}}
-\DeclareMathOperator{\sign}{sign}
-\DeclareMathOperator{\Tr}{Tr}
-\let\ab\allowbreak
diff --git a/app/scripts/latex-to-mdx/input/natbib.sty b/app/scripts/latex-to-mdx/input/natbib.sty
deleted file mode 100644
index ff0d0b91b6ef41468c593a0ca40a81f9a183b055..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/natbib.sty
+++ /dev/null
@@ -1,1246 +0,0 @@
-%%
-%% This is file `natbib.sty',
-%% generated with the docstrip utility.
-%%
-%% The original source files were:
-%%
-%% natbib.dtx  (with options: `package,all')
-%% =============================================
-%% IMPORTANT NOTICE:
-%% 
-%% This program can be redistributed and/or modified under the terms
-%% of the LaTeX Project Public License Distributed from CTAN
-%% archives in directory macros/latex/base/lppl.txt; either
-%% version 1 of the License, or any later version.
-%% 
-%% This is a generated file.
-%% It may not be distributed without the original source file natbib.dtx.
-%% 
-%% Full documentation can be obtained by LaTeXing that original file.
-%% Only a few abbreviated comments remain here to describe the usage.
-%% =============================================
-%% Copyright 1993-2009 Patrick W Daly
-%% Max-Planck-Institut f\"ur Sonnensystemforschung
-%% Max-Planck-Str. 2
-%% D-37191 Katlenburg-Lindau
-%% Germany
-%% E-mail: daly@mps.mpg.de
-\NeedsTeXFormat{LaTeX2e}[1995/06/01]
-\ProvidesPackage{natbib}
-        [2009/07/16 8.31 (PWD, AO)]
-
- % This package reimplements the LaTeX \cite command to be used for various
- % citation styles, both author-year and numerical. It accepts BibTeX
- % output intended for many other packages, and therefore acts as a
- % general, all-purpose citation-style interface.
- %
- % With standard numerical .bst files, only numerical citations are
- % possible. With an author-year .bst file, both numerical and
- % author-year citations are possible.
- %
- % If author-year citations are selected, \bibitem must have one of the
- %   following forms:
- %   \bibitem[Jones et al.(1990)]{key}...
- %   \bibitem[Jones et al.(1990)Jones, Baker, and Williams]{key}...
- %   \bibitem[Jones et al., 1990]{key}...
- %   \bibitem[\protect\citeauthoryear{Jones, Baker, and Williams}{Jones
- %       et al.}{1990}]{key}...
- %   \bibitem[\protect\citeauthoryear{Jones et al.}{1990}]{key}...
- %   \bibitem[\protect\astroncite{Jones et al.}{1990}]{key}...
- %   \bibitem[\protect\citename{Jones et al., }1990]{key}...
- %   \harvarditem[Jones et al.]{Jones, Baker, and Williams}{1990}{key}...
- %
- % This is either to be made up manually, or to be generated by an
- % appropriate .bst file with BibTeX.
- %                            Author-year mode     ||   Numerical mode
- % Then, \citet{key}  ==>>  Jones et al. (1990)    ||   Jones et al. [21]
- %       \citep{key}  ==>> (Jones et al., 1990)    ||   [21]
- % Multiple citations as normal:
- % \citep{key1,key2}  ==>> (Jones et al., 1990; Smith, 1989) || [21,24]
- %                           or  (Jones et al., 1990, 1991)  || [21,24]
- %                           or  (Jones et al., 1990a,b)     || [21,24]
- % \cite{key} is the equivalent of \citet{key} in author-year mode
- %                         and  of \citep{key} in numerical mode
- % Full author lists may be forced with \citet* or \citep*, e.g.
- %       \citep*{key}      ==>> (Jones, Baker, and Williams, 1990)
- % Optional notes as:
- %   \citep[chap. 2]{key}    ==>> (Jones et al., 1990, chap. 2)
- %   \citep[e.g.,][]{key}    ==>> (e.g., Jones et al., 1990)
- %   \citep[see][pg. 34]{key}==>> (see Jones et al., 1990, pg. 34)
- %  (Note: in standard LaTeX, only one note is allowed, after the ref.
- %   Here, one note is like the standard, two make pre- and post-notes.)
- %   \citealt{key}          ==>> Jones et al. 1990
- %   \citealt*{key}         ==>> Jones, Baker, and Williams 1990
- %   \citealp{key}          ==>> Jones et al., 1990
- %   \citealp*{key}         ==>> Jones, Baker, and Williams, 1990
- % Additional citation possibilities (both author-year and numerical modes)
- %   \citeauthor{key}       ==>> Jones et al.
- %   \citeauthor*{key}      ==>> Jones, Baker, and Williams
- %   \citeyear{key}         ==>> 1990
- %   \citeyearpar{key}      ==>> (1990)
- %   \citetext{priv. comm.} ==>> (priv. comm.)
- %   \citenum{key}          ==>> 11 [non-superscripted]
- % Note: full author lists depends on whether the bib style supports them;
- %       if not, the abbreviated list is printed even when full requested.
- %
- % For names like della Robbia at the start of a sentence, use
- %   \Citet{dRob98}         ==>> Della Robbia (1998)
- %   \Citep{dRob98}         ==>> (Della Robbia, 1998)
- %   \Citeauthor{dRob98}    ==>> Della Robbia
- %
- %
- % Citation aliasing is achieved with
- %   \defcitealias{key}{text}
- %   \citetalias{key}  ==>> text
- %   \citepalias{key}  ==>> (text)
- %
- % Defining the citation mode and punctual (citation style)
- %   \setcitestyle{<comma-separated list of keywords, same
- %     as the package options>}
- % Example: \setcitestyle{square,semicolon}
- % Alternatively:
- % Use \bibpunct with 6 mandatory arguments:
- %    1. opening bracket for citation
- %    2. closing bracket
- %    3. citation separator (for multiple citations in one \cite)
- %    4. the letter n for numerical styles, s for superscripts
- %        else anything for author-year
- %    5. punctuation between authors and date
- %    6. punctuation between years (or numbers) when common authors missing
- % One optional argument is the character coming before post-notes. It
- %   appears in square braces before all other arguments. May be left off.
- % Example (and default) \bibpunct[, ]{(}{)}{;}{a}{,}{,}
- %
- % To make this automatic for a given bib style, named newbib, say, make
- % a local configuration file, natbib.cfg, with the definition
- %   \newcommand{\bibstyle@newbib}{\bibpunct...}
- % Then the \bibliographystyle{newbib} will cause \bibstyle@newbib to
- % be called on THE NEXT LATEX RUN (via the aux file).
- %
- % Such preprogrammed definitions may be invoked anywhere in the text
- %  by calling \citestyle{newbib}. This is only useful if the style specified
- %  differs from that in \bibliographystyle.
- %
- % With \citeindextrue and \citeindexfalse, one can control whether the
- % \cite commands make an automatic entry of the citation in the .idx
- % indexing file. For this, \makeindex must also be given in the preamble.
- %
- % Package Options: (for selecting punctuation)
- %   round  -  round parentheses are used (default)
- %   square -  square brackets are used   [option]
- %   curly  -  curly braces are used      {option}
- %   angle  -  angle brackets are used    <option>
- %   semicolon  -  multiple citations separated by semi-colon (default)
- %   colon  - same as semicolon, an earlier confusion
- %   comma  -  separated by comma
- %   authoryear - selects author-year citations (default)
- %   numbers-  selects numerical citations
- %   super  -  numerical citations as superscripts
- %   sort   -  sorts multiple citations according to order in ref. list
- %   sort&compress   -  like sort, but also compresses numerical citations
- %   compress - compresses without sorting
- %   longnamesfirst  -  makes first citation full author list
- %   sectionbib - puts bibliography in a \section* instead of \chapter*
- %   merge - allows the citation key to have a * prefix,
- %           signifying to merge its reference with that of the previous citation.
- %   elide - if references are merged, repeated portions of later ones may be removed.
- %   mcite - recognizes and ignores the * prefix for merging.
- % Punctuation so selected dominates over any predefined ones.
- % Package options are called as, e.g.
- %        \usepackage[square,comma]{natbib}
- % LaTeX the source file natbib.dtx to obtain more details
- % or the file natnotes.tex for a brief reference sheet.
- %-----------------------------------------------------------
-\providecommand\@ifxundefined[1]{%
- \ifx#1\@undefined\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
-}%
-\providecommand\@ifnum[1]{%
- \ifnum#1\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
-}%
-\providecommand\@ifx[1]{%
- \ifx#1\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi
-}%
-\providecommand\appdef[2]{%
- \toks@\expandafter{#1}\@temptokena{#2}%
- \edef#1{\the\toks@\the\@temptokena}%
-}%
-\@ifclassloaded{agu2001}{\PackageError{natbib}
-  {The agu2001 class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{agutex}{\PackageError{natbib}
-  {The AGUTeX class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{aguplus}{\PackageError{natbib}
-  {The aguplus class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{nlinproc}{\PackageError{natbib}
-  {The nlinproc class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{egs}{\PackageError{natbib}
-  {The egs class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
-\@ifclassloaded{egu}{\PackageError{natbib}
-  {The egu class already includes natbib coding,\MessageBreak
-   so you should not add it explicitly}
-  {Type <Return> for now, but then later remove\MessageBreak
-   the command \protect\usepackage{natbib} from the document}
-  \endinput}{}
- % Define citation punctuation for some author-year styles
- % One may add and delete at this point
- % Or put additions into local configuration file natbib.cfg
-\newcommand\bibstyle@chicago{\bibpunct{(}{)}{;}{a}{,}{,}}
-\newcommand\bibstyle@named{\bibpunct{[}{]}{;}{a}{,}{,}}
-\newcommand\bibstyle@agu{\bibpunct{[}{]}{;}{a}{,}{,~}}%Amer. Geophys. Union
-\newcommand\bibstyle@copernicus{\bibpunct{(}{)}{;}{a}{,}{,}}%Copernicus Publications
-\let\bibstyle@egu=\bibstyle@copernicus
-\let\bibstyle@egs=\bibstyle@copernicus
-\newcommand\bibstyle@agsm{\bibpunct{(}{)}{,}{a}{}{,}\gdef\harvardand{\&}}
-\newcommand\bibstyle@kluwer{\bibpunct{(}{)}{,}{a}{}{,}\gdef\harvardand{\&}}
-\newcommand\bibstyle@dcu{\bibpunct{(}{)}{;}{a}{;}{,}\gdef\harvardand{and}}
-\newcommand\bibstyle@aa{\bibpunct{(}{)}{;}{a}{}{,}} %Astronomy & Astrophysics
-\newcommand\bibstyle@pass{\bibpunct{(}{)}{;}{a}{,}{,}}%Planet. & Space Sci
-\newcommand\bibstyle@anngeo{\bibpunct{(}{)}{;}{a}{,}{,}}%Annales Geophysicae
-\newcommand\bibstyle@nlinproc{\bibpunct{(}{)}{;}{a}{,}{,}}%Nonlin.Proc.Geophys.
- % Define citation punctuation for some numerical styles
-\newcommand\bibstyle@cospar{\bibpunct{/}{/}{,}{n}{}{}%
-     \gdef\bibnumfmt##1{##1.}}
-\newcommand\bibstyle@esa{\bibpunct{(Ref.~}{)}{,}{n}{}{}%
-     \gdef\bibnumfmt##1{##1.\hspace{1em}}}
-\newcommand\bibstyle@nature{\bibpunct{}{}{,}{s}{}{\textsuperscript{,}}%
-     \gdef\bibnumfmt##1{##1.}}
- % The standard LaTeX styles
-\newcommand\bibstyle@plain{\bibpunct{[}{]}{,}{n}{}{,}}
-\let\bibstyle@alpha=\bibstyle@plain
-\let\bibstyle@abbrv=\bibstyle@plain
-\let\bibstyle@unsrt=\bibstyle@plain
- % The author-year modifications of the standard styles
-\newcommand\bibstyle@plainnat{\bibpunct{[}{]}{,}{a}{,}{,}}
-\let\bibstyle@abbrvnat=\bibstyle@plainnat
-\let\bibstyle@unsrtnat=\bibstyle@plainnat
-\newif\ifNAT@numbers \NAT@numbersfalse
-\newif\ifNAT@super \NAT@superfalse
-\let\NAT@merge\z@
-\DeclareOption{numbers}{\NAT@numberstrue
-   \ExecuteOptions{square,comma,nobibstyle}}
-\DeclareOption{super}{\NAT@supertrue\NAT@numberstrue
-   \renewcommand\NAT@open{}\renewcommand\NAT@close{}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{authoryear}{\NAT@numbersfalse
-   \ExecuteOptions{round,semicolon,bibstyle}}
-\DeclareOption{round}{%
-      \renewcommand\NAT@open{(} \renewcommand\NAT@close{)}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{square}{%
-      \renewcommand\NAT@open{[} \renewcommand\NAT@close{]}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{angle}{%
-      \renewcommand\NAT@open{$<$} \renewcommand\NAT@close{$>$}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{curly}{%
-      \renewcommand\NAT@open{\{} \renewcommand\NAT@close{\}}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{comma}{\renewcommand\NAT@sep{,}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{semicolon}{\renewcommand\NAT@sep{;}
-   \ExecuteOptions{nobibstyle}}
-\DeclareOption{colon}{\ExecuteOptions{semicolon}}
-\DeclareOption{nobibstyle}{\let\bibstyle=\@gobble}
-\DeclareOption{bibstyle}{\let\bibstyle=\@citestyle}
-\newif\ifNAT@openbib \NAT@openbibfalse
-\DeclareOption{openbib}{\NAT@openbibtrue}
-\DeclareOption{sectionbib}{\def\NAT@sectionbib{on}}
-\def\NAT@sort{\z@}
-\def\NAT@cmprs{\z@}
-\DeclareOption{sort}{\def\NAT@sort{\@ne}}
-\DeclareOption{compress}{\def\NAT@cmprs{\@ne}}
-\DeclareOption{sort&compress}{\def\NAT@sort{\@ne}\def\NAT@cmprs{\@ne}}
-\DeclareOption{mcite}{\let\NAT@merge\@ne}
-\DeclareOption{merge}{\@ifnum{\NAT@merge<\tw@}{\let\NAT@merge\tw@}{}}
-\DeclareOption{elide}{\@ifnum{\NAT@merge<\thr@@}{\let\NAT@merge\thr@@}{}}
-\@ifpackageloaded{cite}{\PackageWarningNoLine{natbib}
-  {The `cite' package should not be used\MessageBreak
-   with natbib. Use option `sort' instead}\ExecuteOptions{sort}}{}
-\@ifpackageloaded{mcite}{\PackageWarningNoLine{natbib}
-  {The `mcite' package should not be used\MessageBreak
-   with natbib. Use option `merge' instead}\ExecuteOptions{merge}}{}
-\@ifpackageloaded{citeref}{\PackageError{natbib}
-  {The `citeref' package must be loaded after natbib}%
-  {Move \protect\usepackage{citeref} to after \string\usepackage{natbib}}}{}
-\newif\ifNAT@longnames\NAT@longnamesfalse
-\DeclareOption{longnamesfirst}{\NAT@longnamestrue}
-\DeclareOption{nonamebreak}{\def\NAT@nmfmt#1{\mbox{\NAT@up#1}}}
-\def\NAT@nmfmt#1{{\NAT@up#1}}
-\renewcommand\bibstyle[1]{\csname bibstyle@#1\endcsname}
-\AtBeginDocument{\global\let\bibstyle=\@gobble}
-\let\@citestyle\bibstyle
-\newcommand\citestyle[1]{\@citestyle{#1}\let\bibstyle\@gobble}
-\newcommand\bibpunct[7][, ]%
-  {\gdef\NAT@open{#2}\gdef\NAT@close{#3}\gdef
-   \NAT@sep{#4}\global\NAT@numbersfalse
-     \ifx #5n\global\NAT@numberstrue\global\NAT@superfalse
-   \else
-     \ifx #5s\global\NAT@numberstrue\global\NAT@supertrue
-   \fi\fi
-   \gdef\NAT@aysep{#6}\gdef\NAT@yrsep{#7}%
-   \gdef\NAT@cmt{#1}%
-   \NAT@@setcites
-  }
-\newcommand\setcitestyle[1]{
- \@for\@tempa:=#1\do
- {\def\@tempb{round}\ifx\@tempa\@tempb
-    \renewcommand\NAT@open{(}\renewcommand\NAT@close{)}\fi
-  \def\@tempb{square}\ifx\@tempa\@tempb
-    \renewcommand\NAT@open{[}\renewcommand\NAT@close{]}\fi
-  \def\@tempb{angle}\ifx\@tempa\@tempb
-    \renewcommand\NAT@open{$<$}\renewcommand\NAT@close{$>$}\fi
-  \def\@tempb{curly}\ifx\@tempa\@tempb
-    \renewcommand\NAT@open{\{}\renewcommand\NAT@close{\}}\fi
-  \def\@tempb{semicolon}\ifx\@tempa\@tempb
-    \renewcommand\NAT@sep{;}\fi
-  \def\@tempb{colon}\ifx\@tempa\@tempb
-    \renewcommand\NAT@sep{;}\fi
-  \def\@tempb{comma}\ifx\@tempa\@tempb
-    \renewcommand\NAT@sep{,}\fi
-  \def\@tempb{authoryear}\ifx\@tempa\@tempb
-    \NAT@numbersfalse\fi
-  \def\@tempb{numbers}\ifx\@tempa\@tempb
-    \NAT@numberstrue\NAT@superfalse\fi
-  \def\@tempb{super}\ifx\@tempa\@tempb
-    \NAT@numberstrue\NAT@supertrue\fi
-  \expandafter\NAT@find@eq\@tempa=\relax\@nil
-  \if\@tempc\relax\else
-    \expandafter\NAT@rem@eq\@tempc
-    \def\@tempb{open}\ifx\@tempa\@tempb
-     \xdef\NAT@open{\@tempc}\fi
-    \def\@tempb{close}\ifx\@tempa\@tempb
-     \xdef\NAT@close{\@tempc}\fi
-    \def\@tempb{aysep}\ifx\@tempa\@tempb
-     \xdef\NAT@aysep{\@tempc}\fi
-    \def\@tempb{yysep}\ifx\@tempa\@tempb
-     \xdef\NAT@yrsep{\@tempc}\fi
-    \def\@tempb{notesep}\ifx\@tempa\@tempb
-     \xdef\NAT@cmt{\@tempc}\fi
-    \def\@tempb{citesep}\ifx\@tempa\@tempb
-     \xdef\NAT@sep{\@tempc}\fi
-  \fi
- }%
- \NAT@@setcites
-}
- \def\NAT@find@eq#1=#2\@nil{\def\@tempa{#1}\def\@tempc{#2}}
- \def\NAT@rem@eq#1={\def\@tempc{#1}}
- \def\NAT@@setcites{\global\let\bibstyle\@gobble}
-\AtBeginDocument{\let\NAT@@setcites\NAT@set@cites}
-\newcommand\NAT@open{(} \newcommand\NAT@close{)}
-\newcommand\NAT@sep{;}
-\ProcessOptions
-\newcommand\NAT@aysep{,} \newcommand\NAT@yrsep{,}
-\newcommand\NAT@cmt{, }
-\newcommand\NAT@cite%
-    [3]{\ifNAT@swa\NAT@@open\if*#2*\else#2\NAT@spacechar\fi
-        #1\if*#3*\else\NAT@cmt#3\fi\NAT@@close\else#1\fi\endgroup}
-\newcommand\NAT@citenum%
-    [3]{\ifNAT@swa\NAT@@open\if*#2*\else#2\NAT@spacechar\fi
-        #1\if*#3*\else\NAT@cmt#3\fi\NAT@@close\else#1\fi\endgroup}
-\newcommand\NAT@citesuper[3]{\ifNAT@swa
-\if*#2*\else#2\NAT@spacechar\fi
-\unskip\kern\p@\textsuperscript{\NAT@@open#1\NAT@@close}%
-   \if*#3*\else\NAT@spacechar#3\fi\else #1\fi\endgroup}
-\providecommand\textsuperscript[1]{\mbox{$^{\mbox{\scriptsize#1}}$}}
-\begingroup \catcode`\_=8
-\gdef\NAT@ifcat@num#1{%
- \ifcat_\ifnum\z@<0#1_\else A\fi
-  \expandafter\@firstoftwo
- \else
-  \expandafter\@secondoftwo
- \fi
-}%
-\endgroup
-\providecommand\@firstofone[1]{#1}
-\newcommand\NAT@citexnum{}
-\def\NAT@citexnum[#1][#2]#3{%
-  \NAT@reset@parser
-  \NAT@sort@cites{#3}%
-  \NAT@reset@citea
-  \@cite{\def\NAT@num{-1}\let\NAT@last@yr\relax\let\NAT@nm\@empty
-    \@for\@citeb:=\NAT@cite@list\do
-    {\@safe@activestrue
-     \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-     \@safe@activesfalse
-     \@ifundefined{b@\@citeb\@extra@b@citeb}{%
-       {\reset@font\bfseries?}
-        \NAT@citeundefined\PackageWarning{natbib}%
-       {Citation `\@citeb' on page \thepage \space undefined}}%
-     {\let\NAT@last@num\NAT@num\let\NAT@last@nm\NAT@nm
-      \NAT@parse{\@citeb}%
-      \ifNAT@longnames\@ifundefined{bv@\@citeb\@extra@b@citeb}{%
-        \let\NAT@name=\NAT@all@names
-        \global\@namedef{bv@\@citeb\@extra@b@citeb}{}}{}%
-      \fi
-      \ifNAT@full\let\NAT@nm\NAT@all@names\else
-        \let\NAT@nm\NAT@name\fi
-      \ifNAT@swa
-       \@ifnum{\NAT@ctype>\@ne}{%
-        \@citea
-        \NAT@hyper@{\@ifnum{\NAT@ctype=\tw@}{\NAT@test{\NAT@ctype}}{\NAT@alias}}%
-       }{%
-        \@ifnum{\NAT@cmprs>\z@}{%
-         \NAT@ifcat@num\NAT@num
-          {\let\NAT@nm=\NAT@num}%
-          {\def\NAT@nm{-2}}%
-         \NAT@ifcat@num\NAT@last@num
-          {\@tempcnta=\NAT@last@num\relax}%
-          {\@tempcnta\m@ne}%
-         \@ifnum{\NAT@nm=\@tempcnta}{%
-          \@ifnum{\NAT@merge>\@ne}{}{\NAT@last@yr@mbox}%
-         }{%
-           \advance\@tempcnta by\@ne
-           \@ifnum{\NAT@nm=\@tempcnta}{%
-             \ifx\NAT@last@yr\relax
-               \def@NAT@last@yr{\@citea}%
-             \else
-               \def@NAT@last@yr{--\NAT@penalty}%
-             \fi
-           }{%
-             \NAT@last@yr@mbox
-           }%
-         }%
-        }{%
-         \@tempswatrue
-         \@ifnum{\NAT@merge>\@ne}{\@ifnum{\NAT@last@num=\NAT@num\relax}{\@tempswafalse}{}}{}%
-         \if@tempswa\NAT@citea@mbox\fi
-        }%
-       }%
-       \NAT@def@citea
-      \else
-        \ifcase\NAT@ctype
-          \ifx\NAT@last@nm\NAT@nm \NAT@yrsep\NAT@penalty\NAT@space\else
-            \@citea \NAT@test{\@ne}\NAT@spacechar\NAT@mbox{\NAT@super@kern\NAT@@open}%
-          \fi
-          \if*#1*\else#1\NAT@spacechar\fi
-          \NAT@mbox{\NAT@hyper@{{\citenumfont{\NAT@num}}}}%
-          \NAT@def@citea@box
-        \or
-          \NAT@hyper@citea@space{\NAT@test{\NAT@ctype}}%
-        \or
-          \NAT@hyper@citea@space{\NAT@test{\NAT@ctype}}%
-        \or
-          \NAT@hyper@citea@space\NAT@alias
-        \fi
-      \fi
-     }%
-    }%
-      \@ifnum{\NAT@cmprs>\z@}{\NAT@last@yr}{}%
-      \ifNAT@swa\else
-        \@ifnum{\NAT@ctype=\z@}{%
-          \if*#2*\else\NAT@cmt#2\fi
-        }{}%
-        \NAT@mbox{\NAT@@close}%
-      \fi
-  }{#1}{#2}%
-}%
-\def\NAT@citea@mbox{%
- \@citea\mbox{\NAT@hyper@{{\citenumfont{\NAT@num}}}}%
-}%
-\def\NAT@hyper@#1{%
- \hyper@natlinkstart{\@citeb\@extra@b@citeb}#1\hyper@natlinkend
-}%
-\def\NAT@hyper@citea#1{%
- \@citea
- \NAT@hyper@{#1}%
- \NAT@def@citea
-}%
-\def\NAT@hyper@citea@space#1{%
- \@citea
- \NAT@hyper@{#1}%
- \NAT@def@citea@space
-}%
-\def\def@NAT@last@yr#1{%
- \protected@edef\NAT@last@yr{%
-  #1%
-  \noexpand\mbox{%
-   \noexpand\hyper@natlinkstart{\@citeb\@extra@b@citeb}%
-   {\noexpand\citenumfont{\NAT@num}}%
-   \noexpand\hyper@natlinkend
-  }%
- }%
-}%
-\def\NAT@last@yr@mbox{%
- \NAT@last@yr\let\NAT@last@yr\relax
- \NAT@citea@mbox
-}%
-\newcommand\NAT@test[1]{%
- \@ifnum{#1=\@ne}{%
-  \ifx\NAT@nm\NAT@noname
-   \begingroup\reset@font\bfseries(author?)\endgroup
-   \PackageWarning{natbib}{%
-    Author undefined for citation`\@citeb' \MessageBreak on page \thepage%
-   }%
-  \else \NAT@nm
-  \fi
- }{%
-  \if\relax\NAT@date\relax
-   \begingroup\reset@font\bfseries(year?)\endgroup
-   \PackageWarning{natbib}{%
-    Year undefined for citation`\@citeb' \MessageBreak on page \thepage%
-   }%
-  \else \NAT@date
-  \fi
- }%
-}%
-\let\citenumfont=\@empty
-\newcommand\NAT@citex{}
-\def\NAT@citex%
-  [#1][#2]#3{%
-  \NAT@reset@parser
-  \NAT@sort@cites{#3}%
-  \NAT@reset@citea
-  \@cite{\let\NAT@nm\@empty\let\NAT@year\@empty
-    \@for\@citeb:=\NAT@cite@list\do
-    {\@safe@activestrue
-     \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-     \@safe@activesfalse
-     \@ifundefined{b@\@citeb\@extra@b@citeb}{\@citea%
-       {\reset@font\bfseries ?}\NAT@citeundefined
-                 \PackageWarning{natbib}%
-       {Citation `\@citeb' on page \thepage \space undefined}\def\NAT@date{}}%
-     {\let\NAT@last@nm=\NAT@nm\let\NAT@last@yr=\NAT@year
-      \NAT@parse{\@citeb}%
-      \ifNAT@longnames\@ifundefined{bv@\@citeb\@extra@b@citeb}{%
-        \let\NAT@name=\NAT@all@names
-        \global\@namedef{bv@\@citeb\@extra@b@citeb}{}}{}%
-      \fi
-     \ifNAT@full\let\NAT@nm\NAT@all@names\else
-       \let\NAT@nm\NAT@name\fi
-     \ifNAT@swa\ifcase\NAT@ctype
-       \if\relax\NAT@date\relax
-         \@citea\NAT@hyper@{\NAT@nmfmt{\NAT@nm}\NAT@date}%
-       \else
-         \ifx\NAT@last@nm\NAT@nm\NAT@yrsep
-            \ifx\NAT@last@yr\NAT@year
-              \def\NAT@temp{{?}}%
-              \ifx\NAT@temp\NAT@exlab\PackageWarningNoLine{natbib}%
-               {Multiple citation on page \thepage: same authors and
-               year\MessageBreak without distinguishing extra
-               letter,\MessageBreak appears as question mark}\fi
-              \NAT@hyper@{\NAT@exlab}%
-            \else\unskip\NAT@spacechar
-              \NAT@hyper@{\NAT@date}%
-            \fi
-         \else
-           \@citea\NAT@hyper@{%
-             \NAT@nmfmt{\NAT@nm}%
-             \hyper@natlinkbreak{%
-               \NAT@aysep\NAT@spacechar}{\@citeb\@extra@b@citeb
-             }%
-             \NAT@date
-           }%
-         \fi
-       \fi
-     \or\@citea\NAT@hyper@{\NAT@nmfmt{\NAT@nm}}%
-     \or\@citea\NAT@hyper@{\NAT@date}%
-     \or\@citea\NAT@hyper@{\NAT@alias}%
-     \fi \NAT@def@citea
-     \else
-       \ifcase\NAT@ctype
-        \if\relax\NAT@date\relax
-          \@citea\NAT@hyper@{\NAT@nmfmt{\NAT@nm}}%
-        \else
-         \ifx\NAT@last@nm\NAT@nm\NAT@yrsep
-            \ifx\NAT@last@yr\NAT@year
-              \def\NAT@temp{{?}}%
-              \ifx\NAT@temp\NAT@exlab\PackageWarningNoLine{natbib}%
-               {Multiple citation on page \thepage: same authors and
-               year\MessageBreak without distinguishing extra
-               letter,\MessageBreak appears as question mark}\fi
-              \NAT@hyper@{\NAT@exlab}%
-            \else
-              \unskip\NAT@spacechar
-              \NAT@hyper@{\NAT@date}%
-            \fi
-         \else
-           \@citea\NAT@hyper@{%
-             \NAT@nmfmt{\NAT@nm}%
-             \hyper@natlinkbreak{\NAT@spacechar\NAT@@open\if*#1*\else#1\NAT@spacechar\fi}%
-               {\@citeb\@extra@b@citeb}%
-             \NAT@date
-           }%
-         \fi
-        \fi
-       \or\@citea\NAT@hyper@{\NAT@nmfmt{\NAT@nm}}%
-       \or\@citea\NAT@hyper@{\NAT@date}%
-       \or\@citea\NAT@hyper@{\NAT@alias}%
-       \fi
-       \if\relax\NAT@date\relax
-         \NAT@def@citea
-       \else
-         \NAT@def@citea@close
-       \fi
-     \fi
-     }}\ifNAT@swa\else\if*#2*\else\NAT@cmt#2\fi
-     \if\relax\NAT@date\relax\else\NAT@@close\fi\fi}{#1}{#2}}
-\def\NAT@spacechar{\ }%
-\def\NAT@separator{\NAT@sep\NAT@penalty}%
-\def\NAT@reset@citea{\c@NAT@ctr\@ne\let\@citea\@empty}%
-\def\NAT@def@citea{\def\@citea{\NAT@separator\NAT@space}}%
-\def\NAT@def@citea@space{\def\@citea{\NAT@separator\NAT@spacechar}}%
-\def\NAT@def@citea@close{\def\@citea{\NAT@@close\NAT@separator\NAT@space}}%
-\def\NAT@def@citea@box{\def\@citea{\NAT@mbox{\NAT@@close}\NAT@separator\NAT@spacechar}}%
-\newif\ifNAT@par \NAT@partrue
-\newcommand\NAT@@open{\ifNAT@par\NAT@open\fi}
-\newcommand\NAT@@close{\ifNAT@par\NAT@close\fi}
-\newcommand\NAT@alias{\@ifundefined{al@\@citeb\@extra@b@citeb}{%
-  {\reset@font\bfseries(alias?)}\PackageWarning{natbib}
-  {Alias undefined for citation `\@citeb'
-  \MessageBreak on page \thepage}}{\@nameuse{al@\@citeb\@extra@b@citeb}}}
-\let\NAT@up\relax
-\newcommand\NAT@Up[1]{{\let\protect\@unexpandable@protect\let~\relax
-  \expandafter\NAT@deftemp#1}\expandafter\NAT@UP\NAT@temp}
-\newcommand\NAT@deftemp[1]{\xdef\NAT@temp{#1}}
-\newcommand\NAT@UP[1]{\let\@tempa\NAT@UP\ifcat a#1\MakeUppercase{#1}%
-  \let\@tempa\relax\else#1\fi\@tempa}
-\newcommand\shortcites[1]{%
-  \@bsphack\@for\@citeb:=#1\do
-  {\@safe@activestrue
-   \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-   \@safe@activesfalse
-   \global\@namedef{bv@\@citeb\@extra@b@citeb}{}}\@esphack}
-\newcommand\NAT@biblabel[1]{\hfill}
-\newcommand\NAT@biblabelnum[1]{\bibnumfmt{#1}}
-\let\bibnumfmt\@empty
-\providecommand\@biblabel[1]{[#1]}
-\AtBeginDocument{\ifx\bibnumfmt\@empty\let\bibnumfmt\@biblabel\fi}
-\newcommand\NAT@bibsetnum[1]{\settowidth\labelwidth{\@biblabel{#1}}%
-   \setlength{\leftmargin}{\labelwidth}\addtolength{\leftmargin}{\labelsep}%
-   \setlength{\itemsep}{\bibsep}\setlength{\parsep}{\z@}%
-   \ifNAT@openbib
-     \addtolength{\leftmargin}{\bibindent}%
-     \setlength{\itemindent}{-\bibindent}%
-     \setlength{\listparindent}{\itemindent}%
-     \setlength{\parsep}{0pt}%
-   \fi
-}
-\newlength{\bibhang}
-\setlength{\bibhang}{1em}
-\newlength{\bibsep}
- {\@listi \global\bibsep\itemsep \global\advance\bibsep by\parsep}
-
-\newcommand\NAT@bibsetup%
-   [1]{\setlength{\leftmargin}{\bibhang}\setlength{\itemindent}{-\leftmargin}%
-       \setlength{\itemsep}{\bibsep}\setlength{\parsep}{\z@}}
-\newcommand\NAT@set@cites{%
-  \ifNAT@numbers
-    \ifNAT@super \let\@cite\NAT@citesuper
-       \def\NAT@mbox##1{\unskip\nobreak\textsuperscript{##1}}%
-       \let\citeyearpar=\citeyear
-       \let\NAT@space\relax
-       \def\NAT@super@kern{\kern\p@}%
-    \else
-       \let\NAT@mbox=\mbox
-       \let\@cite\NAT@citenum
-       \let\NAT@space\NAT@spacechar
-       \let\NAT@super@kern\relax
-    \fi
-    \let\@citex\NAT@citexnum
-    \let\@biblabel\NAT@biblabelnum
-    \let\@bibsetup\NAT@bibsetnum
-    \renewcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@num\NAT@close}%
-    \def\natexlab##1{}%
-    \def\NAT@penalty{\penalty\@m}%
-  \else
-    \let\@cite\NAT@cite
-    \let\@citex\NAT@citex
-    \let\@biblabel\NAT@biblabel
-    \let\@bibsetup\NAT@bibsetup
-    \let\NAT@space\NAT@spacechar
-    \let\NAT@penalty\@empty
-    \renewcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@date\NAT@close}%
-    \def\natexlab##1{##1}%
-  \fi}
-\AtBeginDocument{\NAT@set@cites}
-\AtBeginDocument{\ifx\SK@def\@undefined\else
-\ifx\SK@cite\@empty\else
-  \SK@def\@citex[#1][#2]#3{\SK@\SK@@ref{#3}\SK@@citex[#1][#2]{#3}}\fi
-\ifx\SK@citeauthor\@undefined\def\HAR@checkdef{}\else
-  \let\citeauthor\SK@citeauthor
-  \let\citefullauthor\SK@citefullauthor
-  \let\citeyear\SK@citeyear\fi
-\fi}
-\newif\ifNAT@full\NAT@fullfalse
-\newif\ifNAT@swa
-\DeclareRobustCommand\citet
-   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@partrue
-     \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\newcommand\NAT@citetp{\@ifnextchar[{\NAT@@citetp}{\NAT@@citetp[]}}
-\newcommand\NAT@@citetp{}
-\def\NAT@@citetp[#1]{\@ifnextchar[{\@citex[#1]}{\@citex[][#1]}}
-\DeclareRobustCommand\citep
-   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@partrue
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\cite
-    {\begingroup\let\NAT@ctype\z@\NAT@partrue\NAT@swatrue
-      \@ifstar{\NAT@fulltrue\NAT@cites}{\NAT@fullfalse\NAT@cites}}
-\newcommand\NAT@cites{\@ifnextchar [{\NAT@@citetp}{%
-     \ifNAT@numbers\else
-     \NAT@swafalse
-     \fi
-    \NAT@@citetp[]}}
-\DeclareRobustCommand\citealt
-   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@parfalse
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\citealp
-   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\citenum
-   {\begingroup
-     \NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse\let\textsuperscript\NAT@spacechar
-     \NAT@citexnum[][]}
-\DeclareRobustCommand\citeauthor
-   {\begingroup\NAT@swafalse\let\NAT@ctype\@ne\NAT@parfalse
-    \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citet
-   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@partrue
-     \let\NAT@up\NAT@Up
-     \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citep
-   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@partrue
-     \let\NAT@up\NAT@Up
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citealt
-   {\begingroup\NAT@swafalse\let\NAT@ctype\z@\NAT@parfalse
-     \let\NAT@up\NAT@Up
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citealp
-   {\begingroup\NAT@swatrue\let\NAT@ctype\z@\NAT@parfalse
-     \let\NAT@up\NAT@Up
-         \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\Citeauthor
-   {\begingroup\NAT@swafalse\let\NAT@ctype\@ne\NAT@parfalse
-     \let\NAT@up\NAT@Up
-    \@ifstar{\NAT@fulltrue\NAT@citetp}{\NAT@fullfalse\NAT@citetp}}
-\DeclareRobustCommand\citeyear
-   {\begingroup\NAT@swafalse\let\NAT@ctype\tw@\NAT@parfalse\NAT@citetp}
-\DeclareRobustCommand\citeyearpar
-   {\begingroup\NAT@swatrue\let\NAT@ctype\tw@\NAT@partrue\NAT@citetp}
-\newcommand\citetext[1]{\NAT@open#1\NAT@close}
-\DeclareRobustCommand\citefullauthor
-   {\citeauthor*}
-\newcommand\defcitealias[2]{%
-   \@ifundefined{al@#1\@extra@b@citeb}{}
-   {\PackageWarning{natbib}{Overwriting existing alias for citation #1}}
-   \@namedef{al@#1\@extra@b@citeb}{#2}}
-\DeclareRobustCommand\citetalias{\begingroup
-   \NAT@swafalse\let\NAT@ctype\thr@@\NAT@parfalse\NAT@citetp}
-\DeclareRobustCommand\citepalias{\begingroup
-   \NAT@swatrue\let\NAT@ctype\thr@@\NAT@partrue\NAT@citetp}
-\renewcommand\nocite[1]{\@bsphack
-  \@for\@citeb:=#1\do{%
-    \@safe@activestrue
-    \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-    \@safe@activesfalse
-    \if@filesw\immediate\write\@auxout{\string\citation{\@citeb}}\fi
-    \if*\@citeb\else
-    \@ifundefined{b@\@citeb\@extra@b@citeb}{%
-       \NAT@citeundefined \PackageWarning{natbib}%
-       {Citation `\@citeb' undefined}}{}\fi}%
-  \@esphack}
-\newcommand\NAT@parse[1]{%
-  \begingroup
-   \let\protect=\@unexpandable@protect
-   \let~\relax
-   \let\active@prefix=\@gobble
-   \edef\NAT@temp{\csname b@#1\@extra@b@citeb\endcsname}%
-   \aftergroup\NAT@split
-   \expandafter
-  \endgroup
-  \NAT@temp{}{}{}{}{}@@%
-  \expandafter\NAT@parse@date\NAT@date??????@@%
-  \ifciteindex\NAT@index\fi
-}%
-\def\NAT@split#1#2#3#4#5@@{%
-  \gdef\NAT@num{#1}\gdef\NAT@name{#3}\gdef\NAT@date{#2}%
-  \gdef\NAT@all@names{#4}%
-  \ifx\NAT@num\@empty\gdef\NAT@num{0}\fi
-  \ifx\NAT@noname\NAT@all@names \gdef\NAT@all@names{#3}\fi
-}%
-\def\NAT@reset@parser{%
-  \global\let\NAT@num\@empty
-  \global\let\NAT@name\@empty
-  \global\let\NAT@date\@empty
-  \global\let\NAT@all@names\@empty
-}%
-\newcommand\NAT@parse@date{}
-\def\NAT@parse@date#1#2#3#4#5#6@@{%
-  \ifnum\the\catcode`#1=11\def\NAT@year{}\def\NAT@exlab{#1}\else
-  \ifnum\the\catcode`#2=11\def\NAT@year{#1}\def\NAT@exlab{#2}\else
-  \ifnum\the\catcode`#3=11\def\NAT@year{#1#2}\def\NAT@exlab{#3}\else
-  \ifnum\the\catcode`#4=11\def\NAT@year{#1#2#3}\def\NAT@exlab{#4}\else
-    \def\NAT@year{#1#2#3#4}\def\NAT@exlab{{#5}}\fi\fi\fi\fi}
-\newcommand\NAT@index{}
-\let\NAT@makeindex=\makeindex
-\renewcommand\makeindex{\NAT@makeindex
-  \renewcommand\NAT@index{\@bsphack\begingroup
-     \def~{\string~}\@wrindex{\NAT@idxtxt}}}
-\newcommand\NAT@idxtxt{\NAT@name\NAT@spacechar\NAT@open\NAT@date\NAT@close}
-\@ifxundefined\@indexfile{}{\let\NAT@makeindex\relax\makeindex}
-\newif\ifciteindex \citeindexfalse
-\newcommand\citeindextype{default}
-\newcommand\NAT@index@alt{{\let\protect=\noexpand\let~\relax
-  \xdef\NAT@temp{\NAT@idxtxt}}\expandafter\NAT@exp\NAT@temp\@nil}
-\newcommand\NAT@exp{}
-\def\NAT@exp#1\@nil{\index[\citeindextype]{#1}}
-
-\AtBeginDocument{%
-\@ifpackageloaded{index}{\let\NAT@index=\NAT@index@alt}{}}
-\newcommand\NAT@ifcmd{\futurelet\NAT@temp\NAT@ifxcmd}
-\newcommand\NAT@ifxcmd{\ifx\NAT@temp\relax\else\expandafter\NAT@bare\fi}
-\def\NAT@bare#1(#2)#3(@)#4\@nil#5{%
-  \if @#2
-    \expandafter\NAT@apalk#1, , \@nil{#5}%
-  \else
-  \NAT@wrout{\the\c@NAT@ctr}{#2}{#1}{#3}{#5}%
-\fi
-}
-\newcommand\NAT@wrout[5]{%
-\if@filesw
-      {\let\protect\noexpand\let~\relax
-       \immediate
-       \write\@auxout{\string\bibcite{#5}{{#1}{#2}{{#3}}{{#4}}}}}\fi
-\ignorespaces}
-\def\NAT@noname{{}}
-\renewcommand\bibitem{\@ifnextchar[{\@lbibitem}{\@lbibitem[]}}%
-\let\NAT@bibitem@first@sw\@secondoftwo
-\def\@lbibitem[#1]#2{%
-  \if\relax\@extra@b@citeb\relax\else
-    \@ifundefined{br@#2\@extra@b@citeb}{}{%
-     \@namedef{br@#2}{\@nameuse{br@#2\@extra@b@citeb}}%
-    }%
-  \fi
-  \@ifundefined{b@#2\@extra@b@citeb}{%
-   \def\NAT@num{}%
-  }{%
-   \NAT@parse{#2}%
-  }%
-  \def\NAT@tmp{#1}%
-  \expandafter\let\expandafter\bibitemOpen\csname NAT@b@open@#2\endcsname
-  \expandafter\let\expandafter\bibitemShut\csname NAT@b@shut@#2\endcsname
-  \@ifnum{\NAT@merge>\@ne}{%
-   \NAT@bibitem@first@sw{%
-    \@firstoftwo
-   }{%
-    \@ifundefined{NAT@b*@#2}{%
-     \@firstoftwo
-    }{%
-     \expandafter\def\expandafter\NAT@num\expandafter{\the\c@NAT@ctr}%
-     \@secondoftwo
-    }%
-   }%
-  }{%
-   \@firstoftwo
-  }%
-  {%
-   \global\advance\c@NAT@ctr\@ne
-   \@ifx{\NAT@tmp\@empty}{\@firstoftwo}{%
-    \@secondoftwo
-   }%
-   {%
-    \expandafter\def\expandafter\NAT@num\expandafter{\the\c@NAT@ctr}%
-    \global\NAT@stdbsttrue
-   }{}%
-   \bibitem@fin
-   \item[\hfil\NAT@anchor{#2}{\NAT@num}]%
-   \global\let\NAT@bibitem@first@sw\@secondoftwo
-   \NAT@bibitem@init
-  }%
-  {%
-   \NAT@anchor{#2}{}%
-   \NAT@bibitem@cont
-   \bibitem@fin
-  }%
-  \@ifx{\NAT@tmp\@empty}{%
-    \NAT@wrout{\the\c@NAT@ctr}{}{}{}{#2}%
-  }{%
-    \expandafter\NAT@ifcmd\NAT@tmp(@)(@)\@nil{#2}%
-  }%
-}%
-\def\bibitem@fin{%
- \@ifxundefined\@bibstop{}{\csname bibitem@\@bibstop\endcsname}%
-}%
-\def\NAT@bibitem@init{%
- \let\@bibstop\@undefined
-}%
-\def\NAT@bibitem@cont{%
- \let\bibitem@Stop\bibitemStop
- \let\bibitem@NoStop\bibitemContinue
-}%
-\def\BibitemOpen{%
- \bibitemOpen
-}%
-\def\BibitemShut#1{%
- \bibitemShut
- \def\@bibstop{#1}%
- \let\bibitem@Stop\bibitemStop
- \let\bibitem@NoStop\bibitemNoStop
-}%
-\def\bibitemStop{}%
-\def\bibitemNoStop{.\spacefactor\@mmm\space}%
-\def\bibitemContinue{\spacefactor\@mmm\space}%
-\mathchardef\@mmm=3000 %
-\providecommand{\bibAnnote}[3]{%
-  \BibitemShut{#1}%
-  \def\@tempa{#3}\@ifx{\@tempa\@empty}{}{%
-   \begin{quotation}\noindent
-    \textsc{Key:}\ #2\\\textsc{Annotation:}\ \@tempa
-   \end{quotation}%
-  }%
-}%
-\providecommand{\bibAnnoteFile}[2]{%
-  \IfFileExists{#2}{%
-    \bibAnnote{#1}{#2}{\input{#2}}%
-  }{%
-    \bibAnnote{#1}{#2}{}%
-  }%
-}%
-\let\bibitemOpen\relax
-\let\bibitemShut\relax
-\def\bibfield{\@ifnum{\NAT@merge>\tw@}{\@bibfield}{\@secondoftwo}}%
-\def\@bibfield#1#2{%
- \begingroup
-  \let\Doi\@gobble
-  \let\bibinfo\relax
-  \let\restore@protect\@empty
-  \protected@edef\@tempa{#2}%
-  \aftergroup\def\aftergroup\@tempa
- \expandafter\endgroup\expandafter{\@tempa}%
- \expandafter\@ifx\expandafter{\csname @bib#1\endcsname\@tempa}{%
-  \expandafter\let\expandafter\@tempa\csname @bib@X#1\endcsname
- }{%
-  \expandafter\let\csname @bib#1\endcsname\@tempa
-  \expandafter\let\expandafter\@tempa\csname @bib@Y#1\endcsname
- }%
- \@ifx{\@tempa\relax}{\let\@tempa\@firstofone}{}%
- \@tempa{#2}%
-}%
-\def\bibinfo#1{%
- \expandafter\let\expandafter\@tempa\csname bibinfo@X@#1\endcsname
- \@ifx{\@tempa\relax}{\@firstofone}{\@tempa}%
-}%
-\def\@bib@Xauthor#1{\let\@bib@Xjournal\@gobble}%
-\def\@bib@Xjournal#1{\begingroup\let\bibinfo@X@journal\@bib@Z@journal#1\endgroup}%
-\def\@bibibid@#1{\textit{ibid}.}%
-\appdef\NAT@bibitem@init{%
- \let\@bibauthor  \@empty
- \let\@bibjournal \@empty
- \let\@bib@Z@journal\@bibibid@
-}%
-\ifx\SK@lbibitem\@undefined\else
-   \let\SK@lbibitem\@lbibitem
-   \def\@lbibitem[#1]#2{%
-     \SK@lbibitem[#1]{#2}\SK@\SK@@label{#2}\ignorespaces}\fi
-\newif\ifNAT@stdbst \NAT@stdbstfalse
-
-\AtEndDocument{%
-  \ifNAT@stdbst\if@filesw
-   \immediate\write\@auxout{%
-    \string\providecommand\string\NAT@force@numbers{}%
-    \string\NAT@force@numbers
-   }%
-  \fi\fi
- }
-\newcommand\NAT@force@numbers{%
-  \ifNAT@numbers\else
-  \PackageError{natbib}{Bibliography not compatible with author-year
-  citations.\MessageBreak
-  Press <return> to continue in numerical citation style}
-  {Check the bibliography entries for non-compliant syntax,\MessageBreak
-   or select author-year BibTeX style, e.g. plainnat}%
-  \global\NAT@numberstrue\fi}
-
-\providecommand\bibcite{}
-\renewcommand\bibcite[2]{%
- \@ifundefined{b@#1\@extra@binfo}{\relax}{%
-   \NAT@citemultiple
-   \PackageWarningNoLine{natbib}{Citation `#1' multiply defined}%
- }%
- \global\@namedef{b@#1\@extra@binfo}{#2}%
-}%
-\AtEndDocument{\NAT@swatrue\let\bibcite\NAT@testdef}
-\newcommand\NAT@testdef[2]{%
-  \def\NAT@temp{#2}%
-  \expandafter \ifx \csname b@#1\@extra@binfo\endcsname\NAT@temp
-  \else
-    \ifNAT@swa \NAT@swafalse
-      \PackageWarningNoLine{natbib}{%
-        Citation(s) may have changed.\MessageBreak
-        Rerun to get citations correct%
-      }%
-    \fi
-  \fi
-}%
-\newcommand\NAT@apalk{}
-\def\NAT@apalk#1, #2, #3\@nil#4{%
-  \if\relax#2\relax
-    \global\NAT@stdbsttrue
-    \NAT@wrout{#1}{}{}{}{#4}%
-  \else
-    \NAT@wrout{\the\c@NAT@ctr}{#2}{#1}{}{#4}%
-  \fi
-}%
-\newcommand\citeauthoryear{}
-\def\citeauthoryear#1#2#3(@)(@)\@nil#4{%
-  \if\relax#3\relax
-    \NAT@wrout{\the\c@NAT@ctr}{#2}{#1}{}{#4}%
-  \else
-    \NAT@wrout{\the\c@NAT@ctr}{#3}{#2}{#1}{#4}%
-  \fi
-}%
-\newcommand\citestarts{\NAT@open}%
-\newcommand\citeends{\NAT@close}%
-\newcommand\betweenauthors{and}%
-\newcommand\astroncite{}
-\def\astroncite#1#2(@)(@)\@nil#3{%
- \NAT@wrout{\the\c@NAT@ctr}{#2}{#1}{}{#3}%
-}%
-\newcommand\citename{}
-\def\citename#1#2(@)(@)\@nil#3{\expandafter\NAT@apalk#1#2, \@nil{#3}}
-\newcommand\harvarditem[4][]{%
- \if\relax#1\relax
-   \bibitem[#2(#3)]{#4}%
- \else
-   \bibitem[#1(#3)#2]{#4}%
- \fi
-}%
-\newcommand\harvardleft{\NAT@open}
-\newcommand\harvardright{\NAT@close}
-\newcommand\harvardyearleft{\NAT@open}
-\newcommand\harvardyearright{\NAT@close}
-\AtBeginDocument{\providecommand{\harvardand}{and}}
-\newcommand\harvardurl[1]{\textbf{URL:} \textit{#1}}
-\providecommand\bibsection{}
-\@ifundefined{chapter}{%
-  \renewcommand\bibsection{%
-   \section*{\refname\@mkboth{\MakeUppercase{\refname}}{\MakeUppercase{\refname}}}%
-  }%
-}{%
-  \@ifxundefined\NAT@sectionbib{%
-    \renewcommand\bibsection{%
-      \chapter*{\bibname\@mkboth{\MakeUppercase{\bibname}}{\MakeUppercase{\bibname}}}%
-    }%
-  }{%
-    \renewcommand\bibsection{%
-      \section*{\bibname\ifx\@mkboth\@gobbletwo\else\markright{\MakeUppercase{\bibname}}\fi}%
-    }%
-  }%
-}%
-\@ifclassloaded{amsart}{\renewcommand\bibsection{\section*{\refname}}}{}%
-\@ifclassloaded{amsbook}{\renewcommand\bibsection{\chapter*{\bibname}}}{}%
-\@ifxundefined\bib@heading{}{\let\bibsection\bib@heading}%
-\newcounter{NAT@ctr}
-\renewenvironment{thebibliography}[1]{%
- \bibsection
- \parindent\z@
- \bibpreamble
- \bibfont
- \list{\@biblabel{\the\c@NAT@ctr}}{\@bibsetup{#1}\global\c@NAT@ctr\z@}%
- \ifNAT@openbib
-   \renewcommand\newblock{\par}%
- \else
-   \renewcommand\newblock{\hskip .11em \@plus.33em \@minus.07em}%
- \fi
- \sloppy\clubpenalty4000\widowpenalty4000
- \sfcode`\.\@m
- \let\NAT@bibitem@first@sw\@firstoftwo
-    \let\citeN\cite \let\shortcite\cite
-    \let\citeasnoun\cite
-}{%
- \bibitem@fin
- \bibpostamble
- \def\@noitemerr{%
-  \PackageWarning{natbib}{Empty `thebibliography' environment}%
- }%
- \endlist
- \bibcleanup
-}%
-\let\bibfont\@empty
-\let\bibpreamble\@empty
-\let\bibpostamble\@empty
-\def\bibcleanup{\vskip-\lastskip}%
-\providecommand\reset@font{\relax}
-\providecommand\bibname{Bibliography}
-\providecommand\refname{References}
-\newcommand\NAT@citeundefined{\gdef \NAT@undefined {%
-    \PackageWarningNoLine{natbib}{There were undefined citations}}}
-\let \NAT@undefined \relax
-\newcommand\NAT@citemultiple{\gdef \NAT@multiple {%
-    \PackageWarningNoLine{natbib}{There were multiply defined citations}}}
-\let \NAT@multiple \relax
-\AtEndDocument{\NAT@undefined\NAT@multiple}
-\providecommand\@mkboth[2]{}
-\providecommand\MakeUppercase{\uppercase}
-\providecommand{\@extra@b@citeb}{}
-\gdef\@extra@binfo{}
-\def\NAT@anchor#1#2{%
- \hyper@natanchorstart{#1\@extra@b@citeb}%
-  \def\@tempa{#2}\@ifx{\@tempa\@empty}{}{\@biblabel{#2}}%
- \hyper@natanchorend
-}%
-\providecommand\hyper@natanchorstart[1]{}%
-\providecommand\hyper@natanchorend{}%
-\providecommand\hyper@natlinkstart[1]{}%
-\providecommand\hyper@natlinkend{}%
-\providecommand\hyper@natlinkbreak[2]{#1}%
-\AtBeginDocument{%
-  \@ifpackageloaded{babel}{%
-     \let\org@@citex\@citex}{}}
-\providecommand\@safe@activestrue{}%
-\providecommand\@safe@activesfalse{}%
-
-\newcommand\NAT@sort@cites[1]{%
-  \let\NAT@cite@list\@empty
-  \@for\@citeb:=#1\do{\expandafter\NAT@star@cite\@citeb\@@}%
-  \if@filesw
-    \expandafter\immediate\expandafter\write\expandafter\@auxout
-      \expandafter{\expandafter\string\expandafter\citation\expandafter{\NAT@cite@list}}%
-  \fi
-  \@ifnum{\NAT@sort>\z@}{%
-    \expandafter\NAT@sort@cites@\expandafter{\NAT@cite@list}%
-  }{}%
-}%
-\def\NAT@star@cite{%
-  \let\NAT@star@sw\@secondoftwo
-  \@ifnum{\NAT@merge>\z@}{%
-   \@ifnextchar*{%
-    \let\NAT@star@sw\@firstoftwo
-    \NAT@star@cite@star
-   }{%
-    \NAT@star@cite@nostar
-   }%
-  }{%
-   \NAT@star@cite@noextension
-  }%
-}%
-\def\NAT@star@cite@star*{%
- \NAT@star@cite@nostar
-}%
-\def\NAT@star@cite@nostar{%
- \let\nat@keyopt@open\@empty
- \let\nat@keyopt@shut\@empty
- \@ifnextchar[{\NAT@star@cite@pre}{\NAT@star@cite@pre[]}%
-}%
-\def\NAT@star@cite@pre[#1]{%
- \def\nat@keyopt@open{#1}%
- \@ifnextchar[{\NAT@star@cite@post}{\NAT@star@cite@post[]}%
-}%
-\def\NAT@star@cite@post[#1]#2\@@{%
- \def\nat@keyopt@shut{#1}%
- \NAT@star@sw{\expandafter\global\expandafter\let\csname NAT@b*@#2\endcsname\@empty}{}%
- \NAT@cite@list@append{#2}%
-}%
-\def\NAT@star@cite@noextension#1\@@{%
-  \let\nat@keyopt@open\@empty
-  \let\nat@keyopt@shut\@empty
-  \NAT@cite@list@append{#1}%
-}%
-\def\NAT@cite@list@append#1{%
-  \edef\@citeb{\@firstofone#1\@empty}%
-  \if@filesw\@ifxundefined\@cprwrite{}{\expandafter\@cprwrite\@citeb=}\fi
-  \if\relax\nat@keyopt@open\relax\else
-   \global\expandafter\let\csname NAT@b@open@\@citeb\endcsname\nat@keyopt@open
-  \fi
-  \if\relax\nat@keyopt@shut\relax\else
-   \global\expandafter\let\csname NAT@b@shut@\@citeb\endcsname\nat@keyopt@shut
-  \fi
-  \toks@\expandafter{\NAT@cite@list}%
-  \ifx\NAT@cite@list\@empty
-    \@temptokena\expandafter{\@citeb}%
-  \else
-    \@temptokena\expandafter{\expandafter,\@citeb}%
-  \fi
-  \edef\NAT@cite@list{\the\toks@\the\@temptokena}%
-}%
-\newcommand\NAT@sort@cites@[1]{%
-  \count@\z@
-  \@tempcntb\m@ne
-  \let\@celt\delimiter
-  \def\NAT@num@list{}%
-  \let\NAT@cite@list\@empty
-  \let\NAT@nonsort@list\@empty
-  \@for \@citeb:=#1\do{\NAT@make@cite@list}%
-  \ifx\NAT@nonsort@list\@empty\else
-   \protected@edef\NAT@cite@list{\NAT@cite@list\NAT@nonsort@list}%
-  \fi
-  \ifx\NAT@cite@list\@empty\else
-   \protected@edef\NAT@cite@list{\expandafter\NAT@xcom\NAT@cite@list @@}%
-  \fi
-}%
-\def\NAT@make@cite@list{%
-  \advance\count@\@ne
-  \@safe@activestrue
-  \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}%
-  \@safe@activesfalse
-  \@ifundefined{b@\@citeb\@extra@b@citeb}%
-   {\def\NAT@num{A}}%
-   {\NAT@parse{\@citeb}}%
-  \NAT@ifcat@num\NAT@num
-   {\@tempcnta\NAT@num \relax
-    \@ifnum{\@tempcnta<\@tempcntb}{%
-      \let\NAT@@cite@list=\NAT@cite@list
-      \let\NAT@cite@list\@empty
-      \begingroup\let\@celt=\NAT@celt\NAT@num@list\endgroup
-      \protected@edef\NAT@num@list{%
-       \expandafter\NAT@num@celt \NAT@num@list \@gobble @%
-      }%
-    }{%
-      \protected@edef\NAT@num@list{\NAT@num@list \@celt{\NAT@num}}%
-      \protected@edef\NAT@cite@list{\NAT@cite@list\@citeb,}%
-      \@tempcntb\@tempcnta
-    }%
-   }%
-   {\protected@edef\NAT@nonsort@list{\NAT@nonsort@list\@citeb,}}%
-}%
-\def\NAT@celt#1{%
-  \@ifnum{#1>\@tempcnta}{%
-    \xdef\NAT@cite@list{\NAT@cite@list\@citeb,\NAT@@cite@list}%
-    \let\@celt\@gobble
-  }{%
-    \expandafter\def@NAT@cite@lists\NAT@@cite@list\@@
-  }%
-}%
-\def\NAT@num@celt#1#2{%
- \ifx#1\@celt
-  \@ifnum{#2>\@tempcnta}{%
-    \@celt{\number\@tempcnta}%
-    \@celt{#2}%
-  }{%
-    \@celt{#2}%
-    \expandafter\NAT@num@celt
-  }%
- \fi
-}%
-\def\def@NAT@cite@lists#1,#2\@@{%
-  \xdef\NAT@cite@list{\NAT@cite@list#1,}%
-  \xdef\NAT@@cite@list{#2}%
-}%
-\def\NAT@nextc#1,#2@@{#1,}
-\def\NAT@restc#1,#2{#2}
-\def\NAT@xcom#1,@@{#1}
-\InputIfFileExists{natbib.cfg}
-       {\typeout{Local config file natbib.cfg used}}{}
-%% 
-%% <<<<< End of generated file <<<<<<
-%%
-%% End of file `natbib.sty'.
diff --git a/app/scripts/latex-to-mdx/input/preamble.tex b/app/scripts/latex-to-mdx/input/preamble.tex
deleted file mode 100644
index 3168a025348b0d8e5b33fa045e5ee620abdae566..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/preamble.tex
+++ /dev/null
@@ -1,48 +0,0 @@
-
-\definecolor{lightyellow}{rgb}{1, 0.95, 0.85}
-\definecolor{graphicbackground}{rgb}{0.9765,0.9451,0.9059}
-\definecolor{codebackground}{rgb}{0.8314,0.949,0.9882}
-
-\newcommand{\finding}[2]{%
-  \begin{tcolorbox}[colback=lightyellow, colframe=black, arc=4pt, boxsep=1pt]
-    \paragraph{\textbf{\textit{Finding} #1.}} #2
-  \end{tcolorbox}%
-}
-
-% Tables - packages already loaded in main.tex
-% \usepackage{booktabs}
-% \usepackage{graphicx}
-% \usepackage{subfig}
-% \usepackage{subcaption}
-% \usepackage{multirow}
-
-\newcommand{\tablestyle}[2]{\setlength{\tabcolsep}{#1}\renewcommand{\arraystretch}{#2}\centering\footnotesize}
-\newlength\savewidth\newcommand\shline{\noalign{\global\savewidth\arrayrulewidth
-  \global\arrayrulewidth 1pt}\hline\noalign{\global\arrayrulewidth\savewidth}}
-
-\newlength\thinwidth\newcommand\thinline{\noalign{\global\savewidth\arrayrulewidth
-  \global\arrayrulewidth 0.5pt}\hline\noalign{\global\arrayrulewidth\savewidth}}
-
-\definecolor{Gray}{gray}{0.92}
-\definecolor{DarkGray}{gray}{0.5}
-
-% colors
-\definecolor{LightCyan}{rgb}{0.88,1,1}
-\definecolor{altRowColor}{gray}{0.92}
-\definecolor{highlightRowColor}{rgb}{0.9, 0.9, 1}
-\newcommand{\colorrow}{\rowcolor{highlightRowColor}}
-\newcommand{\grayrow}{\rowcolor{Gray}}
-\newcommand{\highlightcell}{\cellcolor{highlightRowColor}}
-\newcommand{\colorcell}{\cellcolor{Gray}}
-
-
-\newcommand{\cpar}{\par\noindent\textbf}
-
-\newcommand{\graytext}{\textcolor{gray}}
-
-
-\newcommand{\Mus}[1]{{\color{red}\textbf{Mus:#1}}}
-\newcommand{\Dan}[1]{{\color{blue}\textbf{Dana:#1}}}
-
-
-\newcommand{\ours}{{SmolVLA}\xspace}
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/sections/00_abstract.tex b/app/scripts/latex-to-mdx/input/sections/00_abstract.tex
deleted file mode 100644
index 1c9b7b986304a9cd7fd5f02f02ed6118d5f8c020..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/00_abstract.tex
+++ /dev/null
@@ -1,9 +0,0 @@
-Robot learning is at an inflection point, driven by rapid advancements in machine learning and the growing availability of large-scale robotics data. 
-This shift from classical, model-based methods to data-driven, learning-based paradigms is unlocking unprecedented capabilities in autonomous systems. 
-This tutorial navigates the landscape of modern robot learning, charting a course from the foundational principles of Reinforcement Learning and Behavioral Cloning to generalist, language-conditioned models capable of operating across diverse tasks and even robot embodiments.
-This work is intended as a guide for researchers and practitioners, and our goal is to equip the reader with the conceptual understanding and hands-on tools necessary to understand and contribute to developments in robot learning.
-\newline
-
-Code: \textbf{\url{https://github.com/huggingface/lerobot}}
-\newline
-Date: \textbf{\today}
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/sections/01_introduction.tex b/app/scripts/latex-to-mdx/input/sections/01_introduction.tex
deleted file mode 100644
index 44c3e869efdbf86a9c7ed0ddf6fdacf594f65704..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/01_introduction.tex
+++ /dev/null
@@ -1,99 +0,0 @@
-\section{Introduction}
-
-\begin{figure}
-    \centering
-    \includegraphics[width=\linewidth]{figures/ch1/ch1-lerobot-figure1.png}
-    \caption{\lerobot~is the open-source library for end-to-end robotics developed by Hugging Face. The library is vertically integrated on the entire robotics stack, supporting low-level control of real-world robot devices, advanced data and inference optimizations, as well as  SOTA robot learning methods with simple implementations in pure Pytorch.}
-    \label{fig:figure1}
-\end{figure}
-
-Autonomous robotics holds the premise of relieving humans from repetitive, tiring or dangerous manual tasks.
-Consequently, the field of robotics has been widely studied since its first inception in the 1950s.
-Lately, advancements in Machine Learning (ML) have sparked the development of a relatively new class of methods used to tackle robotics problems, leveraging large amounts of data and computation rather than human expertise and modeling skills to develop autonomous systems.
-
-The frontier of robotics research is indeed increasingly moving away from classical model-based control paradigm, embracing the advancements made in ML, aiming to unlock (1) monolithic perception-to-action control pipelines and (2) multi-modal data-driven feature extraction strategies, together with (3) reduced reliance on precise models of the world and (4) a better positioning to benefit from the growing availability of open robotics data.
-While central problems in manipulation, locomotion and whole-body control demand knowledge of rigid-body dynamics, contact modeling, planning under uncertainty, recent results seem to indicate learning can prove just as effective as explicit modeling, sparking interest in the field of \emph{robot learning}.
-This interest can be largely justified considering the significant challenges related to deriving accurate models of robot-environment interactions.
-
-Moreover, since end-to-end learning on ever-growing collections of text and image data has historically been at the core of the development of \emph{foundation models} capable of semantic reasoning across multiple modalities (images, text, audio, etc.), deriving robotics methods grounded in learning appears particularly consequential, especially as the number of openly available datasets continues to grow.
-
-Robotics is, at its core, an inherently multidisciplinary field, requiring a wide range of expertise in both \emph{software} and \emph{hardware}.
-The integration of learning-based techniques further broadens this spectrum of skills, raising the bar for both research and practical applications.
-\lerobot~is an open-source library designed to integrate end-to-end with the entire robotics stack.
-With a strong focus on accessible, real-world robots \highlight{(1) \lerobot~supports many, openly available, robotic platforms} for manipulation, locomotion and even whole-body control.
-\lerobot also implements a \highlight{(2) unified, low-level approach to reading/writing robot configurations} to extend support for other robot platforms with relatively low effort. 
-The library introduces \lerobotdataset, \highlight{(3) a native robotics dataset's format} currently being used by the community to efficiently record and share datasets.
-\lerobot~also supports many state-of-the-art (SOTA) algorithms in robot learning---mainly based on Reinforcement Learning (RL) and Behavioral Cloning (BC) techniques---with efficient implementations in Pytorch, and extended support to experimentation and experiments tracking.
-Lastly, \lerobot~defines a custom, optimized inference stack for robotic policies decoupling action planning from action execution, proving effective in guaranteeing more adaptability at runtime.
-
-This tutorial serves the double purpose of providing useful references for the Science behind---and practical use of---common robot learning techniques.
-To this aim, we strike to provide a rigorous yet concise overview of the core concepts behind the techniques presented, paired with practical examples of how to use such techniques concretely, with code examples in \lerobot, for researchers and practitioners interested in the field of robot learning.
-This tutorial is structured as follows:
-\begin{itemize}
-\item Section~\ref{sec:classical} reviews classical robotics foundations, introducing the limitations of dynamics-based approaches to robotics.
-\item Section~\ref{sec:learning-rl} elaborates on the limitations of dynamics-based methods, and introduce RL as a practical approach to solve robotics problems, considering its upsides and potential limitations.
-\item Section~\ref{sec:robot-imitation-learning} further describes robot learning techniques that aim at solving single-tasks learning, leveraging BC techniques to autonomously reproduce specific expert demonstrations.
-\item Section~\ref{sec:learning-foundation} presents recent contributions on developing generalist models for robotics applications, by learning from large corpora of multi-task \& multi-robot data (\emph{robotics foundation models}).
-% \item Lastly, Section~\ref{sec:extensions} covers emerging directions in robot learning research, introducing recent works in post-training techniques for robotics foundation models, as well as recent works in world models for robotics.
-\end{itemize}
-
-Our goal with this tutorial is to provide an intuitive explanation of the reasons various disparate ideas from Machine Learning (ML) have converged and are powering the current evolution of Robotics, driving the unprecedented progress we see today.
-We complement our presentation of the most common and recent approaches in robot learning with practical code implementations using \lerobot, and start here by presenting the dataset format introduced with \lerobot.
-
-\subsection{\lerobotdataset}
-
-\lerobotdataset~is a standardized dataset format designed to address the specific needs of robot learning research, and it provides a unified and convenient access to robotics data across modalities, including sensorimotor readings, multiple camera feeds and teleoperation status.
-\lerobotdataset~also accommodates for storing general information regarding the data being collected, including textual descriptions of the task being performed by the teleoperator, the kind of robot used, and relevant measurement specifics like the frames per second at which the recording of both image and robot state's streams are proceeding.
-
-In this, \lerobotdataset~provides a unified interface for handling multi-modal, time-series data, and it is designed to seamlessly integrate with the PyTorch and Hugging Face ecosystems.
-\lerobotdataset~can be easily extended by users and it is highly customizable by users, and it already supports openly available data coming from a variety of embodiments supported in \lerobot, ranging from manipulator platforms like the SO-100 arm and ALOHA-2 setup, to real-world humanoid arm and hands, as well as entirely simulation-based datasets, and self-driving cars.
-This dataset format is built to be both efficient for training and flexible enough to accommodate the diverse data types encountered in robotics, while promoting reproducibility and ease of use for users. 
-
-\subsubsection{The dataset class design}
-
-A core design choice behind \lerobotdataset~is separating the underlying data storage from the user-facing API.
-This allows for efficient storage while presenting the data in an intuitive, ready-to-use format.
-
-Datasets are always organized into three main components:
-\begin{itemize}
-\item \textbf{Tabular Data}: Low-dimensional, high-frequency data such as joint states, and actions are stored in efficient memory-mapped files, and typically offloaded to the more mature \texttt{datasets} library by Hugging Face, providing fast with limited memory consumption.
-\item \textbf{Visual Data}: To handle large volumes of camera data, frames are concatenated and encoded into MP4 files. Frames from the same episode are always grouped together into the same video, and multiple videos are grouped together by camera. To reduce stress on the file system, groups of videos for the same camera view are also broke into multiple sub-directories, after a given threshold number.
-\item \textbf{Metadata} A collection of JSON files which describes the dataset's structure in terms of its metadata, serving as the relational counterpart to both the tabular and visual dimensions of data. Metadata include the different feature schema, frame rates, normalization statistics, and episode boundaries.
-\end{itemize}
-
-For scalability, and to support datasets with potentially millions of trajectories (resulting in hundreds of millions or billions of individual camera frames), we merge data from different episodes into the same high-level structure.
-Concretely, this means that any given tabular collection and video will not typically contain information about one episode only, but rather a concatenation of the information available in multiple episodes.
-This keeps the pressure on the file system limited, both locally and on remote storage providers like Hugging Face, though at the expense of leveraging more heavily relational-like, metadata parts of the dataset, which are used to reconstruct information such as at which position, in a given file, an episode starts or ends.
-An example struture for a given \lerobotdataset~would appear as follows:
-\begin{itemize}
-\item \texttt{meta/info.json}: This metadata is a central metadata file. It contains the complete dataset schema, defining all features (e.g., \texttt{observation.state}, \texttt{action}), their shapes, and data types. It also stores crucial information like the dataset's frames-per-second (\texttt{fps}), \lerobot's version at the time of capture, and the path templates used to locate data and video files.
-\item \texttt{meta/stats.json}: This file stores aggregated statistics (mean, std, min, max) for each feature across the entire dataset, used for data normalization for most policy models and accessible externally via \texttt{dataset.meta.stats}.
-\item \texttt{meta/tasks.jsonl}: This file contains the mapping from natural language task descriptions to integer task indices, which are useful for task-conditioned policy training.
-\item \texttt{meta/episodes/*} This directory contains metadata about each individual episode, such as its length, the corresponding task, and pointers to where its data is stored in the dataset's files. For scalability, this information is stored in files rather than a single large JSON file.
-\item \texttt{data/*}: Contains the core frame-by-frame tabular data, using parquet files to allow for fast, memory-mapped access. To improve performance and handle large datasets, data from multiple episodes are concatenated into larger files. These files are organized into chunked subdirectories to keep the size of directories manageable. A single file typically contains data for more than one single episode.
-\item \texttt{videos/*}: Contains the MP4 video files for all visual observation streams. Similar to the \texttt{data/} directory, the video footage from multiple episodes is concatenated into single MP4 files. This strategy significantly reduces the number of files in the dataset, which is more efficient for modern filesystems.
-\end{itemize}
-
-\subsection{Code Example: Batching a (Streaming) Dataset}
-
-This section provides an overview of how to access datasets hosted on Hugging Face using the \lerobotdataset~class.
-Every dataset on the Hugging Face Hub containing the three main pillars presented above (Tabular, Visual and relational Metadata), and can be assessed with a single instruction.
-
-In practice, most reinforcement learning (RL) and behavioral cloning (BC) algorithms tend to operate on stack of observation and actions.
-For the sake of brevity, we will refer to joint spaces, and camera frames with the single term of \emph{frame}.
-For instance, RL algorithms may use a history of previous frames \(o_{t-H_o:t} \) to mitigate partial observability, and BC algorithms are in practice trained to regress chunks of multiple actions (\(a_{t+t+H_a} \)) rather than single controls.
-To accommodate for these specifics of robot learning training, \lerobotdataset~provides a native windowing operation, whereby users can define the \emph{seconds} of a given window (before and after) around any given frame, by using the \texttt{delta\_timestemps} functionality.
-Unavailable frames are opportunely padded, and a padding mask is also returned to filter out the padded frames.
-Notably, this all happens within the \lerobotdataset, and is entirely transparent to higher level wrappers commonly used in training ML models such as \texttt{torch.utils.data.DataLoader}.
-
-Conveniently, by using \lerobotdataset~with a Pytorch \texttt{DataLoader} one can automatically collate the individual sample dictionaries from the dataset into a single dictionary of batched tensors for downstream training or inference.
-\lerobotdataset~also natively supports streaming mode for datasets.
-Users can stream data of a large dataset hosted on the Hugging Face Hub, with a one-line change in their implementation.
-Streaming datasets supports high-performance batch processing (ca. 80-100 it/s, varying on connectivity) and high levels of frames randomization, key features for practical BC algorithms which otherwise may be slow or operating on highly non-i.i.d. data.
-This feature is designed to improve on accessibility so that large datasets can be processed by users without requiring large amounts of memory and storage.
-
-\begin{pbox}[label={ex:dataset-batching}]{Batching a (Streaming) Dataset 
-    %\\ \url{flow_matching/examples/standalone_discrete_flow_matching.ipynb}
-}
-\inputminted{python}{snippets/01_1_datasets.py}
-\end{pbox}
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/sections/02_classic_robotics.tex b/app/scripts/latex-to-mdx/input/sections/02_classic_robotics.tex
deleted file mode 100644
index b27c4385c7463efd2d1d3742bcd8568c4e4038dd..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/02_classic_robotics.tex
+++ /dev/null
@@ -1,227 +0,0 @@
-\section{Classical Robotics}
-\label{sec:classical}
-
-\epigraph{\textit{Know your enemy} [...]}{Sun Tzu}
-
-\begin{tldr}
-Learning-based approaches to robotics are motivated by the need to (1) generalize across tasks and embodiments (2) reduce dependency on human expertise (3) leverage historical trends on the production of data---all traditionally overlooked by dynamics-based techniques.
-\end{tldr}
-
-\subsection{Explicit and Implicit Models}
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.5\linewidth]{figures/ch2/ch2-approaches.png}
-    \caption{Overview of methods to generate motion (clearly non-exhausitve, see~\citet{bekrisStateRobotMotion2024}). The different methods can be grouped based on whether they explicitly (\emph{dynamics-based}) or implicitly (\emph{learning-based}) model robot-environment interactions.}
-    \label{fig:generating-motion-atlas}
-\end{figure}
-
-Robotics is concerned with producing artificial motion in the physical world in useful, reliable and safe fashion.
-Thus, robotics is an inherently multi-disciplinar domain: producing autonomous motion in the physical world requires, to the very least, interfacing different software (motion planners) and hardware (motion executioners) components.
-Further, knowledge of mechanical, electrical, and software engineering, as well as rigid-body mechanics and control theory have therefore proven quintessential in robotics since the field first developed in the 1950s.
-More recently, Machine Learning (ML) has also proved effective in robotics, complementing these more traditional disciplines~\citep{connellRobotLearning1993}.
-As a direct consequence of its multi-disciplinar nature, robotics has developed as a rather wide array of methods, all concerned with the main purpose of \highlight{producing artificial motion in the physical world}.
-
-Methods to produce robotics motion range from traditional \emph{explicit} models---\highlight{dynamics-based}\footnote{In here, we refer to both \emph{kinematics} and \emph{dynamics}-based control.} methods, leveraging precise descriptions of the mechanics of robots' rigid bodies and their interactions with eventual obstacles in the environment---to \emph{implicit} models---\highlight{learning-based} methods, treating artificial motion as a statistical pattern to learn given multiple sensorimotor readings~\citep{agrawalComputationalSensorimotorLearning,bekrisStateRobotMotion2024}.
-A variety of methods have been developed between these two extrema.
-For instance, ~\citet{hansenTemporalDifferenceLearning2022} show how learning-based systems can benefit from information on the physics of problems, complementing a traditional learning method such as Temporal Difference (TD)-learning~\citet{suttonReinforcementLearningIntroduction2018} with Model-Predictive Control (MPC).
-Conversely, as explicit models may be relying on assumptions proving overly simplistic---or even unrealistic---in practice, learning can prove effective to improve modeling of complex phenomena or complement perception~\citep{mccormacSemanticFusionDense3D2016}.
-Such examples aim at demonstrating the richness of approaches to robotics, and Figure~\ref{fig:generating-motion-atlas} graphically illustrates some of the most relevant techniques.
-Such a list is clearly far from being exhaustive, and we refer to~\citet{bekrisStateRobotMotion2024} for a more comprehensive overview of both general and application-specific methods for motion generation.
-In this section, we wish to introduce the inherent benefits of \highlight{learning-based approaches to robotics}---the core focus on this tutorial.
-
-\subsection{Different Types of Motion}
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.7\linewidth]{figures/ch2/ch2-platforms.png}
-    \caption{Different kinds of motions are achieved with potentially very different robotic platforms. From left to right, top to bottom: ViperX, SO-100, Boston Dynamics' Spot, Open-Duck, 1X's NEO, Boston Dynamics' Atlas. This is an example list of robotic platforms and is (very) far from being exhaustive.}
-    \label{fig:robotics-platforms-atlas}
-\end{figure}
-
-% Robotics atlas: moving through and modifying an environment (and combinations). That is, (1) locomotion (2) manipulation and (3) whole-body control
-In the vast majority of instances, robotics deals with producing motion via actuating joints connecting nearly entirely-rigid links.
-A key distinction between focus areas in robotics is based on whether the generated motion modifies (1) the absolute state of the environment (via dexterity), (2) the relative state of the robot with respect to its environment (exercising mobility skills), or (3) a combination of the two (Figure~\ref{fig:robotics-platforms-atlas}).
-
-Effects such as (1) are typically achieved \emph{through} the robot, i.e. generating motion to perform an action inducing a desirable modification, effectively \emph{manipulating} the environment (manipulation).
-Motions like (2) may result in changes in the robot's physical location within its environment.
-Generally, modifications to a robot's location within its environment may be considered instances of the general \emph{locomotion} problem, further specified as \emph{wheeled} or \emph{legged} locomotion based on whenever a robot makes use of wheels or leg(s) to move in the environment.
-Lastly, an increased level of dynamism in the robot-environment interactions can be obtained combining (1) and (2), thus designing systems capable to interact with \emph{and} move within their environment.
-This category is problems is typically termed \emph{mobile manipulation}, and is characterized by a typically much larger set of control variables compared to either locomotion or manipulation alone.
-
-% Focus on learning-based approaches and manipulation
-The traditional body of work developed since the very inception of robotics is increasingly complemented by learning-based approaches.
-ML has indeed proven particularly transformative across the entire robotics stack, first empowering planning-based techniques with improved state estimation used for traditional planning~\citep{tangPerceptionNavigationAutonomous2023} and then end-to-end replacing controllers, effectively yielding perception-to-action methods~\citep{koberReinforcementLearningRobotics}.
-Work in producing robots capable of navigating a diverse set of terrains demonstrated the premise of both dynamics and learning-based approaches for locomotion~\citep{griffinWalkingStabilizationUsing2017,jiDribbleBotDynamicLegged2023,leeLearningQuadrupedalLocomotion2020,margolisRapidLocomotionReinforcement2022}, and recent works on whole-body control indicated the premise of learning-based approaches to generate rich motion on complex robots, including humanoids~\citep{zhangWoCoCoLearningWholeBody2024,bjorckGR00TN1Open2025}.
-Manipulation has also been widely studied, particularly considering its relevance for many impactful use-cases ranging from high-risk applications for humans~\citep{fujitaDevelopmentRobotsNuclear2020,alizadehComprehensiveSurveySpace2024} to manufacturing~\citep{sannemanStateIndustrialRobotics2020}.
-While explicit models have proven fundamental in achieving important milestones towards the development of modern robotics, recent works leveraging implicit models proved particularly promising in surpassing scalability and applicability challenges via learning~\citep{koberReinforcementLearningRobotics}.
-
-\subsection{Example: Planar Manipulation}
-% Full physical description by means of forward kinematics to generate movement
-Robot manipulators typically consist of a series of links and joints, articulated in a chain finally connected to an \emph{end-effector}.
-Actuated joints are considered responsible for generating motion of the links, while the end effector is instead used to perform specific actions at the target location (e.g., grasping/releasing objects via closing/opening a gripper end-effector, using a specialized tool like a screwdriver, etc.).
-
-Recently, the development of low-cost manipulators like the ALOHA~\citep{zhaoLearningFineGrainedBimanual2023} ALOHA-2~\citep{aldacoALOHA2Enhanced} and SO-100/SO-101~\citep{knightStandardOpenSO100} platforms significantly lowered the barrier to entry to robotics, considering the increased accessibility of these robots compared to more traditional platforms like the Franka Emika Panda arm (Figure~\ref{fig:robotic-platforms-costs}).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.4\linewidth]{figures/ch2/ch2-cost-accessibility.png}
-    \caption{Cheaper, more accessible robots are starting to rival traditional platforms like the Panda arm platforms in adoption in resource-constrained scenarios. The SO-100, in particular, has a cost in the 100s of Euros, and can be entirely 3D-printed in hours, while the industrially-manufactured Panda arm costs tens of thousands of Euros and is not openly available.}
-    \label{fig:robotic-platforms-costs}
-\end{figure}
-
-Deriving an intuition as per why learning-based approaches are gaining popularity in the robotics community requires briefly analyzing traditional approaches for manipulation, leveraging tools like forward and inverse kinematics (FK, IK) and control theory.
-Providing a detailed overview of these methods falls (well) out of the scope of this tutorial, and we refer the reader to works including~\citet{sicilianoSpringerHandbookRobotics2016, lynchModernRoboticsMechanics2017, tedrakeRoboticManipulationPerception, tedrakeUnderactuatedRoboticsAlgorithms} for a much more comprehensive description of these techniques.
-Here, we mostly wish to highlight the benefits of ML over these traditional techniques
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.7\linewidth]{figures/ch2/ch2-so100-to-planar-manipulator.png}
-    \caption{The SO-100 arm is a 6-dof manipulator arm. Preventing some of its joints (shoulder pane, wrist flex and wrist roll) from actuating, it can be represented as a traditional 2-dof planar manipulator (the gripper joint in the end-effector is not considered towards the count of the degrees of freedom used to produce motion).}
-    \label{fig:make-so100-planar-manipulator}
-\end{figure}
-
-Consider the (simple) case where a SO-100 is restrained from actuating (1) the shoulder pane and (2) the wrist flex and roll motors.
-This effectively reduces the degrees of freedom of the SO-100 from the original 5+1 (5 joints + 1 gripper) to 2+1 (shoulder lift, elbow flex + gripper).
-As the end-effector does not impact motion in this model, the SO-100 is effectively reduced to the planar manipulator robot presented in Figure~\ref{fig:make-so100-planar-manipulator}, where spheres represent actuators, and solid lines indicate length-\(l\) links from the base of the SO-100 to the end-effector (\emph{ee}).
-
-Further, let us make the simplifying assumption that actuators can produce rotations up to \( 2 \pi \) radians.
-In practice, this is seldom the case due to movement obstructions caused by the robot body itself (for instance, the shoulder lift cannot produce counter-clockwise movement due to the presence of the robot's base used to secure the SO-100 to its support and host the robot bus), but we will introduce movement obstruction at a later stage.
-
-All these simplifying assumptions leave us with the planar manipulator of Figure~\ref{fig:planar-manipulation-simple}, free of moving its end-effector by controlling the angles \( \theta_1 \) and \( \theta_2 \), jointly referred to as the robot's \emph{configuration}, and indicated with \( q = [\theta_1, \theta_2 ] \in [-\pi, +\pi]^2 \).
-The axis attached to the joints indicate the associated reference frame, whereas circular arrows indicate the maximal feasible rotation allowed at each joint. 
-In this tutorial, we do not cover topics related to spatial algebra, and we instead refer the reader to \citet[Chapter~2]{lynchModernRoboticsMechanics2017} and \citet[Chapter~3]{tedrakeRoboticManipulationPerception} for excellent explanations of the mechanics and theoretical foundations of producing motion on rigid bodies.
-
-\newcommand{\panelheight}{3.2cm}  % keeping the following manipulators aligned requires images to be same height
-
-\begin{figure}
-    \centering
-    \begin{subfigure}[t]{0.32\linewidth}
-        \centering
-        \includegraphics[width=\linewidth,height=\panelheight,keepaspectratio]{figures/ch2/ch2-planar-manipulator-free.png}
-        \caption{Free to move}
-        \label{fig:planar-manipulation-simple}
-    \end{subfigure}\hfill
-    \begin{subfigure}[t]{0.32\linewidth}
-        \centering
-        \includegraphics[width=\linewidth,height=\panelheight,keepaspectratio]{figures/ch2/ch2-planar-manipulator-floor.png}
-        \caption{Constrained by the surface}
-        \label{fig:planar-manipulator-floor}
-    \end{subfigure}\hfill
-    \begin{subfigure}[t]{0.32\linewidth}
-        \centering
-        \includegraphics[width=\linewidth,height=\panelheight,keepaspectratio]{figures/ch2/ch2-planar-manipulator-floor-shelf.png}
-        \caption{Constrained by surface and (fixed) obstacle}
-        \label{fig:planar-manipulator-floor-shelf}
-    \end{subfigure}
-    \caption{Planar, 2-dof schematic representation of the SO-100 manipulator under diverse deployment settings. From left to right: completely free of moving; constrained by the presence of the surface; constrained by the surface and presence of obstacles. Circular arrows around each joint indicate the maximal rotation feasible at that joint.}
-\end{figure}
-
-Considering the (toy) example presented in Figure~\ref{fig:planar-manipulation-simple}, then we can analytically write the end-effector's position \( p \in \mathbb R^2 \) as a function of the robot's configuration, \( p = p(q), p: \mathcal Q \mapsto \mathbb R^2 \). 
-In particular, we have:
-\begin{equation*}
-p(q) = 
-\begin{pmatrix}
-p_x(\theta_1, \theta_2) \\  
-p_y(\theta_1, \theta_2)
-\end{pmatrix}
-=
-\begin{pmatrix}
-l \cos(\theta_1) + l \cos(\theta_1 + \theta_2) \\
-l \sin(\theta_1) + l \sin(\theta_1 + \theta_2)
-\end{pmatrix}
-\in S^{n=2}_{l_1+l_2} = \{ p(q) \in \mathbb R^2: \Vert p(q) \Vert_2^2 \leq (2l)^2, \ \forall q \in \mathcal Q \}
-\end{equation*}
-
-Deriving the end-effector's \emph{pose}---position \emph{and} orientation---in some \(m\)-dimensional space \( \vec{p} \in \mathcal{P} \subset \mathbb{R}^{m} \) starting from the configuration \( \q \in \mathcal Q \subset \mathbb R^n \) of a \( n \)-joints robot is referred to as \emph{forward kinematics} (FK), whereas identifying the configuration corresponding to any given target pose is termed \emph{inverse kinematics} (IK).
-In that, FK is used to map a robot configuration into the corresponding end-effector pose, whereas IK is used to reconstruct the configuration(s) given an end-effector pose.
-
-In the simplified case here considered (for which \( \vec{p} \equiv p \), as the orientation of the end-effector is disregarded for simplicity), one can solve the problem of controlling the end-effector's location to reach a goal position \( p^* \) by solving analytically for \( q: p(q) = f_{\FK}(q) = p^*\).
-However, in the general case, one might not be able to solve this problem analytically, and can typically resort to iterative optimization methods comparing candidate solutions using a loss function (in the simplest case, \( \Vert p(q) - p^* \Vert_2^2 \) is a natural candidate), yielding:
-
-\begin{align}
-\min_{q \in \mathcal Q} \Vert p(q) - p^* \Vert_2^2 \, .
-\label{eq:ik_problem}
-\end{align}
-
-Exact analytical solutions to IK are even less appealing when one considers the presence of obstacles in the robot's workspace, resulting in constraints on the possible values of \( q \in \mathcal Q \subseteq [-\pi, +\pi]^n \subset \mathbb R^n \) in the general case of \(n\)-links robots.
-
-For instance, the robot in Figure~\ref{fig:planar-manipulator-floor} is (very naturally) obstacled by the presence of the surface upon which it rests: \( \theta_1 \) can now exclusively vary within \([0,  \pi] \), while possible variations in \( \theta_2 \) depend on \( \theta_1 \) (when \( \theta_1 \to 0 \) or \( \theta_1 \to \pi \), further downwards movements are restricted).
-Even for a simplified kinematic model, developing techniques to solve~eq.~\ref{eq:ik_problem} is in general non-trivial in the presence of constraints, particularly considering that the feasible set of solutions \( \mathcal Q \) may change across problems.
-Figure~\ref{fig:planar-manipulator-floor-shelf} provides an example of how the environment influences the feasible set considered, with a new set of constraints deriving from the position of a new obstacle.
-
-However, IK---solving eq.~\ref{eq:ik_problem} for a feasible \( q \)---only proves useful in determining information regarding the robot's configuration in the goal pose, and crucially does not provide information on the \emph{trajectory} to follow over time to reach a target pose.
-Expert-defined trajectories obviate to this problem providing a length-\(K\) succession of goal poses \( \tau_K = [p^*_0, p^*_1, \dots p^*_K] \) for tracking.
-In practice, trajectories can also be obtained automatically through \emph{motion planning} algorithms, thus avoiding expensive trajectory definition from human experts.
-However, tracking \( \tau_K \) via IK can prove prohibitively expensive, as tracking would require \( K \) resolutions of eq.~\ref{eq:ik_problem} (one for each target pose).
-\emph{Differential} inverse kinematics (diff-IK) complements IK via closed-form solution of a variant of eq.~\ref{eq:ik_problem}. 
-Let \( J(q) \) denote the Jacobian matrix of (partial) derivatives of the FK-function \( f_\FK: \mathcal Q \mapsto \mathcal P \), such that \( J(q) = \frac{\partial f_{FK}(q)}{\partial q } \).
-Then, one can apply the chain rule to any \( p(q) = f_{\FK}(q) \), deriving \( \dot p = J(q) \dot q \), and thus finally relating variations in the robot configurations to variations in pose, thereby providing a platform for control.
-
-Given a desired end-effector trajectory \( \targetvel(t) \) (1) indicating anchor regions in space and (2) how much time to spend in each region, diff-IK finds \( \dot q(t) \) solving for joints' \emph{velocities} instead of \emph{configurations},
-\begin{align}
-\dot q(t) = \arg\min_\nu \; \lVert J(q(t)) \nu - \targetvel (t) \rVert_2^2
-\label{eq:reg_ik_velocity}
-\end{align}
-
-Unlike~eq.~\ref{eq:ik_problem}, solving for \( \dot q \) is much less dependent on the environment (typically, variations in velocity are constrained by physical limits on the actuators).
-Conveniently, eq.~\ref{eq:reg_ik_velocity} also often admits the closed-form solution \( \dot q = J(q)^+ \targetvel \), where \( J^+(q) \) denotes the Moore-Penrose pseudo-inverse of \( J(q) \).
-Finally, discrete-time joint configurations \( q \) can be reconstructed from joint velocities \( \dot q \) using forward-integration on the continuous-time joint velocity , \( q_{t+1} = q_t + \Delta t\,\dot q_t \) for a given \( \Delta t \), resulting in tracking via diff-IK.
-
-Following trajectories with diff-IK is a valid option in well-controlled and static environments (e.g., industrial manipulators in controlled manufacturing settings), and relies on the ability to define a set of target velocities to track \( [\targetvel_0, \targetvel_1, \dots, \targetvel_k ] \)---an error-prone task largely requiring human expertise.
-Furthermore, diff-IK relies on the ability to (1) access \( J(q) \, \forall q \in \mathcal Q \) and (2) compute its pseudo-inverse at every iteration of a given control cycle---a challenging assumption in highly dynamical settings, or for complex kinematic chains.
-
-\subsubsection{Adding Feedback Loops}
-While very effective when a goal trajectory has been well specified, the performance of diff-IK can degrade significantly in the presence of modeling/tracking errors, or in the presence of non-modeled dynamics in the environment.
-
-\begin{wrapfigure}[12]{r}{0.3\textwidth}
-    \vspace{-\intextsep}
-    \centering
-    \includegraphics[width=\linewidth]{figures/ch2/ch2-planar-manipulator-floor-box.png}
-    \caption{Planar manipulator robot in the presence of a moving obstacle.}
-    \label{fig:planar-manipulator-box-velocity}
-\end{wrapfigure}
-
-One such case is presented in Figure~\ref{fig:planar-manipulator-box-velocity}, where another rigid body other than the manipulator is moving in the environment along the horizontal axis, with velocity \( \dot x_B \).
-Accounting analytically for the presence of this disturbance---for instance, to prevent the midpoint of the link from ever colliding with the object---requires access to \( \dot x_B \) at least, to derive the equation characterizing the motion of the environment.
-
-Less predictable disturbances however (e.g., \( \dot x_B \leftarrow \dot x_B + \eps, \eps \sim N(0,1) \)) may prove challenging to model analytically, and one could attain the same result of preventing link-object collision by adding a condition on the distance between the midpoint of \( l \) and \( x_B \), enforced through a feedback loop on the position of the robot and object at each control cycle.
-
-To mitigate the effect of modeling errors, sensing noise and other disturbances, classical pipelines indeed do augment diff-IK with feedback control looping back quantities of interest.
-In practice, following a trajectory with a closed feedback loop might consist in backwarding the error between the target and measured pose, \( \Delta p = \targetpos - p(q) \), hereby modifying the control applied to \( \dot q = J(q)^+ (\targetvel + k_p \Delta p ) \), with \( k_p \) defined as the (proportional) gain.
-
-More advanced techniques for control consisting in feedback linearization, PID control, Linear Quatratic Regulator (LQR) or Model-Predictive Control (MPC) can be employed to stabilize tracking and reject moderate perturbations, and we refer to \citet[Chapter~8]{sicilianoSpringerHandbookRobotics2016} for in-detail explanation of these concepts, or \citep[Chapter~8]{tedrakeRoboticManipulationPerception} for a simple, intuitive example in the case of a point-mass system.
-Nonetheless, feedback control presents its challenges as well: tuning gains remains laborious and system-specific. 
-Further, manipulation tasks present intermittent contacts inducing hybrid dynamics (mode switches) and discontinuities in the Jacobian, challenging the stability guarantees of the controller and thus often necessitating rather conservative gains and substantial hand-tuning.
-
-We point the interested reader to~\citet[Chapter~2,7,8]{sicilianoSpringerHandbookRobotics2016}, \citet[Chapter~6,11]{lynchModernRoboticsMechanics2017}, and~\citet[Chapter~3,8]{tedrakeRoboticManipulationPerception} for extended coverage of FK, IK, diff-IK and control for (diff-)IK.
-
-\subsection{Limitations of Dynamics-based Robotics}
-Despite the last 60+ years of robotics research, autonomous robots are still largely incapable of performing tasks at human-level performance in the physical world generalizing across (1) robot embodiments (different manipulators, different locomotion platforms, etc.) and (2) tasks (tying shoe-laces, manipulating a diverse set of objects).
-While essential in the early development of robotics, the aforementioned methods require significant human expertise to be used in practice, and are typically specific to a particular applicative problem.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\linewidth]{figures/ch2/ch2-classical-limitations.png}
-    \caption{Dynamics-based approaches to robotics suffer from several limitations: (1) orchestrating multiple components poses integration challenges; (2) the need to develop custom processing pipelines for the sensing modalities and tasks considered hinders scalability; (3) simplified analytical models of physical phenomena (here friction at the gripper; credits to~\citet{antonovaReinforcementLearningPivoting2017}) limit real-world performance. Lastly, (4) dynamics-based methods overlook trends in the availability and growth of robotics data.}
-    \label{fig:classical-limitations}
-\end{figure}
-
-Dynamics-based robotics pipelines have historically been \highlight{developed sequentially, engineering the different blocks} now within most architectures for specific purposes.
-That is, sensing, state estimation, mapping, planning, (diff-)IK, and low-level control have been traditionally developed as distinct modules with fixed interfaces.
-Pipelining these specific modules proved error-prone, and brittleness emerges---alongside compounding errors---whenever changes incur (e.g., changes in lighting for sensing, occlusion/failure of sensors, control failures).
-Adapting such a stack to new tasks or robotic platforms often entails re-specifying objectives, constraints, and heuristics at multiple stages, incurring significant engineering overhead.
-
-Moreover, classical planners operate on compact, assumed-sufficient state representations; extending them to reason directly over raw, heterogeneous and noisy data streams is non-trivial.
-This results in a \highlight{limited scalability to multimodal data and multitask settings}, as incorporating high-dimensional perceptual inputs (RGB, depth, tactile, audio) traditionally required extensive engineering efforts to extract meaningful features for control. 
-Also, the large number of tasks, coupled with the adoption of \emph{per-task} planners, goal parameterizations, and safety constraints, results in an explosion in design and validation options, with little opportunity to reuse solutions across tasks.
-
-Setting aside integration and scalability challenges: developing accurate modeling of contact, friction, and compliance for complicated systems remains difficult.
-Rigid-body approximations are often insufficient in the presence of deformable objects, and \highlight{relying on approximated models hinders real-world applicability} of the methods developed.
-In the case of complex, time-dependent and/or non-linear dynamics, even moderate mismatches in parameters, unmodeled evolutions, or grasp-induced couplings can qualitatively affect the observed dynamics.
-
-Lastly, dynamics-based methods (naturally) overlook the rather recent \highlight{increase in availability of openly-available robotics datasets}. 
-The curation of academic datasets by large centralized groups of human experts in robotics~\citep{collaborationOpenXEmbodimentRobotic2025, khazatskyDROIDLargeScaleInTheWild2025} is now increasingly complemented by a \highlight{growing number of robotics datasets contributed in a decentralized fashion} by individuals with varied expertise.
-If not tangentially, dynamics-based approaches are not posed to maximally benefit from this trend, which holds the premise of allowing generalization in the space of tasks and embodiments, like data was the cornerstone for advancements in vision~\citep{alayracFlamingoVisualLanguage2022} and natural-language understanding~\citep{brownLanguageModelsAre2020}.
-
-Taken together, these limitations (Figure~\ref{fig:classical-limitations}) motivate the exploration of learning-based approaches that can (1) integrate perception and control more tightly, (2) adapt across tasks and embodiments with reduced expert modeling interventions and (3) scale gracefully in performance as more robotics data becomes available.
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/sections/03_reinforcement_learning.tex b/app/scripts/latex-to-mdx/input/sections/03_reinforcement_learning.tex
deleted file mode 100644
index 8d519e0c2abe8cc8679a1d618a5cfa6094e5d7f4..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/03_reinforcement_learning.tex
+++ /dev/null
@@ -1,330 +0,0 @@
-\section{Robot (Reinforcement) Learning}
-\label{sec:learning-rl}
-
-\epigraph{\textit{Approximate the solution, not the problem} [...]}{Richard Sutton}
-
-\begin{tldr}
-The need for expensive high-fidelity simulators can be obviated by learning from real-world data, using sample-efficient algorithms that can safely train directly on hardware.
-\end{tldr}
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\linewidth]{figures/ch3/ch3-learning-benefits.png}
-    \caption{Learning-based robotics streamlines perception-to-action by learning a (1) unified high-level controller capable to take (2) high-dimensional, unstructured sensorimotor information. Learning (3) does not require a dynamics model and instead focuses on interaction data, and (4) empirically correlates with
-    the scale of the data used.
-    }
-    \label{fig:robot-learning-upsides}
-\end{figure}
-
-Learning-based techniques for robotics naturally address the limitations presented in~\ref{sec:classical} (Figure~\ref{fig:robot-learning-upsides}).
-Learning-based techniques typically rely on prediction-to-action (\emph{visuomotor policies}), thereby directly mapping sensorimotor inputs to predicted actions, streamlining control policies by removing the need to interface multiple components.
-Mapping sensorimotor inputs to actions directly also allows to add diverse input modalities, leveraging the automatic feature extraction characteristic of most modern learning systems.
-Further, learning-based approaches can in principle entirely bypass modeling efforts and instead rely exclusively on interactions data, proving transformative when dynamics are challenging to model or even entirely unknown.
-Lastly, learning for robotics (\emph{robot learning}) is naturally well posed to leverage the growing amount of robotics data openly available, just as computer vision first and natural language processing later did historically benefit from large scale corpora of (possibly non curated) data, in great part overlooked by dynamics-based approaches.
-
-Being a field at its relative nascent stages, no prevalent technique(s) proved distinctly better better in robot learning.
-Still, two major classes of methods gained prominence: \highlight{reinforcement learning (RL)} and \highlight{Behavioral Cloning (BC)} (Figure~\ref{fig:robot-learning-atlas}).
-In this section, we provide a conceptual overview of applications of the former to robotics, as well as introduce practical examples of how to use RL within \lerobot.
-We then introduce the major limitations RL suffers from, to introduce BC techniques in the next sections (\ref{sec:learning-bc-single, sec:learning-bc-generalist}).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.5\linewidth]{figures/ch3/ch3-learning-atlas.png}
-    \caption{Overview of the robot learning methods implemented in \lerobot.}
-    \label{fig:robot-learning-atlas}
-\end{figure}
-
-In Figure~\ref{fig:robot-learning-atlas} we decided to include generalist robot models~\citep{black$p_0$VisionLanguageActionFlow2024,shukorSmolVLAVisionLanguageActionModel2025} alongside task-specific BC methods.
-While significant different in spirit---\emph{generalist} models are language-conditioned and use instructions to generate motion valid across many tasks, while \emph{task-specific} models are typically not language-conditioned and used to perform a single task---foundation models are largely trained to reproduce trajectories contained in a large training set of input demonstrations.
-Thus, we argue generalist policies can indeed be grouped alongside other task-specific BC methods, as they both leverage similar training data and schemas.
-
-Figure~\ref{fig:robot-learning-atlas} illustrates this categorization graphically, explicitly listing all the robot learning policies currently available in \lerobot: Action Chunking with Transformers (ACT)~\citep{zhaoLearningFineGrainedBimanual2023}, Diffusion Policy~\citep{chiDiffusionPolicyVisuomotor2024}, Vector-Quantized Behavior Transformer (VQ-BeT)~\citep{leeBehaviorGenerationLatent2024}, \( \pi_0 \)~\citep{black$p_0$VisionLanguageActionFlow2024}, SmolVLA~\citep{shukorSmolVLAVisionLanguageActionModel2025}, Human-in-the-loop Sample-efficient RL (HIL-SERL)~\citep{luoPreciseDexterousRobotic2024} and TD-MPC~\citep{hansenTemporalDifferenceLearning2022}.
-
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.8\linewidth]{figures/ch3/ch3-rl-examples.png}
-    \caption{Examples of two different robotics tasks performed using RL. In the manipulation task (A) an agent learns to reach for a yellow plastic block in its environment, and to put it inside of a box. In the locomotion task (B) an agent learns to move its center of mass sideways without falling.}
-    \label{fig:robotics-with-rl-examples}
-\end{figure}
-
-Applications of RL to robotics have been long studied, to the point the relationship between these two disciplines has been compared to that between physics and matematics~\citep{koberReinforcementLearningRobotics}.
-Indeed, due to their interactive and sequential nature, many robotics problems can be directly mapped to RL problems.
-Figure~\ref{fig:robotics-with-rl-examples} depicts two of such cases. 
-Reaching for an object to move somewhere else in the scene is an indeed sequential problem where at each cycle the controller needs to adjust the position of the robotic arm based on their current configuration and the (possibly varying) position of the object.
-Figure~\ref{fig:robotics-with-rl-examples} also shows an example of a locomotion problem, where sequentiality is inherent in the problem formulation. 
-While sliding to the side, the controller has to constantly keep adjusting to the robot's propioperception to avoid failure (falling).
-
-\subsection{A (Concise) Introduction to RL}
-The RL framework~\citep{suttonReinforcementLearningIntroduction2018}, which we briefly introduce here, has often been used to model robotics problems~\citep{koberReinforcementLearningRobotics}.
-RL is a subfield within ML fundamentally concerned with the development of autonomous systems (\emph{agents}) learning how to \emph{continuously behave} in an evolving environment, developing (ideally, well-performing) control strategies (\emph{policies}).
-Crucially for robotics, RL agents can improve via trial-and-error only, thus entirely bypassing the need to develop explicit models of the problem dynamics, and rather exploiting interaction data only.
-In RL, this feedback loop (Figure~\ref{fig:rl-most-famous-pic}) between actions and outcomes is established through the agent sensing a scalar quantity (\emph{reward}).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.5\linewidth]{figures/ch3/ch3-agent-env.png}
-    \caption{Agent-Environment interaction diagram (image credits to~\citet{suttonReinforcementLearningIntroduction2018}).}
-    \label{fig:rl-most-famous-pic}
-\end{figure}
-
-Formally, interactions between an agent and its environment are typically modeled via a Markov Decision Process (MDP)~\citep{bellmanMarkovianDecisionProcess1957}.
-Representing robotics problems via MDPs offers several advantages, including (1) incorporating uncertainty through MDP's inherently stochastic formulation and (2) providing a theoretically sound framework for learning \emph{without} an explicit dynamic model.
-While accommodating also a continuous time formulation, MDPs are typically considered in discrete time in RL, thus assuming interactions to atomically take place over the course of discrete \emph{timestep} \( t=0,1,2,3, \dots, T \).
-MDPs allowing for an unbounded number of interactions ( \( T \to + \infty \) ) are typically termed \emph{infinite-horizon}, and opposed to \emph{finite-horizon} MDPs in which \( T \) cannot grow unbounded.
-Unless diversely specified, we will only be referring to discrete-time finite-horizon (\emph{episodic}) MDPs here.
-
-Formally, a lenght-\(T\) Markov Decision Process (MDP) is a tuple \( \mathcal M = \langle \statespace, \actionspace, \dynamics, r, \gamma, \rho, T \rangle \), where:
-\begin{itemize}
-    \item \(\statespace\) is the \emph{state space}; \(\state \in \statespace\) denotes the (possibly non-directly observable) environment state at time \(t\). In robotics, states often comprise robot configuration and velocities (\(q_t, \dot q_t\)), and can accomodate sensor readings such as camera or audio streams.
-    \item \(\actionspace\) is the \emph{action space}; \(\action \in \actionspace\) may represent joint torques, joint velocities, or even end-effector commands. In general, actions correspond to commands intervenings on the configuration of the robot. 
-    \item \(\dynamics\) represents the (possibly non-deterministic) environment dynamics, with \(\dynamics: \statespace \times \actionspace \times \statespace \mapsto [0, 1] \) corresponding to \( \dynamics \, \transition = \transitionprob \). For instance, for a planar manipulator dynamics could be considered deterministic when the environment is fully described (Figure~\ref{fig:planar-manipulation-simple}), and stochastic when unmodeled disturbances depending on non-observable parameters intervene (Figure~\ref{fig:planar-manipulator-box-velocity}).
-    \item \(r: \statespace \times \actionspace \times \statespace \to \mathbb R\) is the \emph{reward function}, weighing the transition \( \transition \) in the context of the achievement of an arbitrary goal. For instance, a simple reward function for quickly moving the along the \( x \) axis in 3D-space (Figure~\ref{fig:robotics-with-rl-examples}) could be based on the absolute position of the robot along the \( x \) axis~(\(p_x\)), present negative penalties for falling over (measured from \( p_z \)) and a introduce bonuses \( \dot p_x \) for speed, \(r \transition \equiv r(\state) = p_{x_t} \cdot \dot p_{x_t} - \tfrac{1}{p_{z_t}} \).
-\end{itemize}
-Lastly, \(\gamma \in [0,1] \) represent the discount factor regulating preference for immediate versus long-term reward (with an effective horizon equal to \( \tfrac{1}{1-\gamma} \)), and \( \rho \) is the distribution, defined over \(\statespace \), the MDP's \emph{initial} state is sampled from, \( s_0 \sim \rho \).
-
-A length-\(T\) \emph{trajectory} is the (random) sequence
-\begin{equation}\label{eq:trajectory_definition}
-    \tau = \trajectory,
-\end{equation}
-with per-step rewards defined as \(r_t = r \transition \) for ease of notation.Interestingly, assuming both the environment dynamics and conditional distribution over actions given states---the \emph{policy}---to be \emph{Markovian}:
-%
-\begin{align}
-\mathbb P(\stateplusone \vert s_t, a_t, s_{t-1}, a_{t-1}, \dots s_0, a_0 ) &= \mathbb P \transitiongiven \label{eq:dynamics_markovian} \\
-\mathbb P(\action \vert \state, a_{t-1}, s_{t-1}, s_0, a_0) &= \mathbb P(\action \vert \state) \label{eq:policy_markovian}
-\end{align}
-%
-The probability of observing a given trajectory \( \tau \) factorizes into
-\begin{equation}\label{eq:traj_prob}
-    \mathbb P(\tau) = \mathbb P (s_0) \prod_{t=0}^{T-1} \mathbb P \transitiongiven \ \mathbb P(\action \vert \state).
-\end{equation}
-
-Policies \( \mathbb P(\action \vert \state) \) are typically indicated as \( \pi(\action \vert \state) \), and often parametrized via \( \theta \), yielding \( \pi_\theta (\action \vert \state )\).
-Policies are trained optimizing the (discounted) \emph{return} associated to a given \( \tau \), i.e. the (random) sum of measured rewards over trajectory:
-\[
-    G(\tau) = \sum_{t=0}^{T-1} \gamma^{t} r_t.
-\]
-In that, agents seek to learn control strategies (\emph{policies}, \( \pi_\theta \)) maximizing the expected return \( \mathbb E_{\tau \sim \pi_\theta} G(\tau) \). 
-For a given dynamics \( \mathcal D \)---i.e., for a given problem---taking the expectation over the (possibly random) trajectories resulting from acting according to a certain policy provides a direct, goal-conditioned ordering in the space of all the possible policies \( \Pi \), yielding the (maximization) target \( J : \Pi \mapsto \mathbb R \)
-\begin{align}
-    J(\pi_\theta) &= \mathbb E_{\tau \sim \mathbb P_{\theta; \mathcal D}} \left[ G(\tau) \right], \label{eq:RL-j-function} \\
-    \mathbb P_{\theta; \mathcal D} (\tau) &= \rho \prod_{t=0}^{T-1} \mathcal D \transition \ \pi_\theta (\action \vert \state).\label{eq:traj-probabilities-for-policies}
-\end{align}
-
-Because in the RL framework the agent is assumed to only be able to observe the environment dynamics and not to intervene on them,~\ref{eq:RL-j-function} varies exclusively with the policy followed.
-In turn, MDPs naturally provide a framework to optimize over the space of the possible behaviors an agent might enact (\( \pi \in \Pi \)), searching for the \emph{optimal policy} \( \pi^* = \arg \max_{\theta} J(\pi_\theta) \), where \( \theta \) is the parametrization adopted by the policy set \( \Pi: \pi_\theta \in \Pi, \ \forall \theta \).
-Other than providing a target for policy search, \( G(\tau) \) can also be used as a target to discriminate between states and state-action pairs.
-Given any state \( s \in \statespace \)---e.g., a given configuration of the robot---the \emph{state-value} function
-\[
-    V_\pi(s) = \mathbb E_{\tau \sim \pi} \left[ G(\tau) \big \vert s_0 = s \right]
-\]
-can be used to discriminate between desirable and undesirable state in terms of long-term (discounted) reward maximization, under a given policy \(\pi\).
-Similarily, the \emph{state-action} value function also conditions the cumulative discounted reward on selecting action \( a \) when in \( s \), and thereafter act according to \( \pi \):
-\[
-    Q_\pi(s,a) = \mathbb E_{\tau \sim \pi} \left[ G (\tau) \big \vert s_0 = s, a_0=a \right]
-\]
-Crucially, value functions are interrelated:
-\begin{align}
-Q_\pi(s_t, a_t) &= \mathbb{E}_{\stateplusone \sim \mathbb P(\bullet \vert \state, \action)} \left[ r_t + \gamma V_\pi(\stateplusone) \right] \label{eq:q-as-v} \\
-V_\pi(\state) &= \mathbb E_{\action \sim \pi(\bullet \vert \state)} \left[ Q_\pi (\state, \action) \right]
-\label{eq:v-as-q}
-\end{align}
-Inducing an ordering over states and state-action pairs under \( \pi \), value functions are central to most RL algorithms.
-A variety of methods have been developed in RL as standalone attemps to find (approximate) solutions to the problem of maximizing cumulative reward (Figure~\ref{fig:rl-algos-atlas}).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.4\textwidth]{figures/ch3/ch3-rl-algorithms-atlas.png}
-    \caption{Popular RL algorithms. See~\citet{SpinningUp2018} for a complete list of citations.}
-    \label{fig:rl-algos-atlas}
-\end{figure}
-
-Popular approaches to continuous state and action space---such as those studied within robotics---include~\citet{schulmanTrustRegionPolicy2017, schulmanProximalPolicyOptimization2017, haarnojaSoftActorCriticOffPolicy2018}.
-Across manipulation~\citep{akkayaSolvingRubiksCube2019} and locomotion~\citep{leeLearningQuadrupedalLocomotion2020} problems, RL proved extremely effective in providing a platform to (1) adopt a unified, streamlined perception-to-action pipeline, (2) natively integrate propioperception with multi-modal high-dimensional sensor streams  (3) disregard a description of the environment dynamics, by focusing on observed interaction data rather than modeling, and (4) anchor policies in the experience collected and stored in datasets.
-For a more complete survey of applications of RL to robotics, we refer the reader to~\citet{koberReinforcementLearningRobotics,tangDeepReinforcementLearning2024}.
-
-\subsection{Real-world RL for Robotics}
-Streamlined end-to-end control pipelines, data-driven feature extraction and a disregard for explicit modeling in favor of interaction data are all features of RL for robotics.
-However, particularly in the context of real-world robotics, RL still suffers from limitations concerning machine safety and learning efficiency.
-
-First, especially early in training, \highlight{actions are typically explorative, and thus erractic}.
-On physical systems, untrained policies may command high velocities, self-collisiding configurations, or torques exceeding joint limits, leading to wear and potential hardware damage.
-Mitigating these risks requires external safeguards (e.g., watchdogs, safety monitors, emergency stops), often incuring in a high degree of human supervision.
-Further, in the typical episodic setting considered in most robotics problems, experimentation is substantially slowed down by the need to manually reset the environment over the course of training, a time-consuming and brittle process.
-
-Second, learning with a limited number of samples remains problematic in RL, \highlight{limiting the applicability of RL in real-world robotics due to consequently prohibitive timescales of training}.
-Even strong algorithms such as SAC~\citep{haarnojaSoftActorCriticOffPolicy2018} typically require a large numbers of transitions \( \{ \sars \}_{t=1}^N \).
-On hardware, generating these data is time-consuming and can even be prohibitive.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.7\linewidth]{figures/ch3/ch3-duck-sim-vs-real.png}
-    \caption{Simulated (left) vs. real-world (right) OpenDuck. Discrepancies in the simulation dynamics (\emph{reality gap}) pose risks to policy transfer.}
-    \label{fig:synthetic-vs-real-duck}
-\end{figure}
-
-Training RL policies in simulation~\citep{tobinDomainRandomizationTransferring2017} addresses both issues: it eliminates physical risk and dramatically increases throughput.
-Yet, simulators require significant modeling effort, and rely on assumptions (simplified physical modeling, instantaneous actuation, static environmental conditions, etc.) limiting transferring policies learned in simulation due the discrepancy between real and simulated environments (\emph{reality gap}, Figure~\ref{fig:synthetic-vs-real-duck}).
-\emph{Domain randomization} (DR) is a popular technique to overcome the reality gap, consisting in randomizing parameters of the simulated environment during training, to induce robustness to specific disturbances.
-In turn, DR is employed to increase the diversity of scenarios over the course of training, improving on the chances sim-to-real transfer~\citep{akkayaSolvingRubiksCube2019,antonovaReinforcementLearningPivoting2017,jiDribbleBotDynamicLegged2023}.
-In practice, DR is performed further parametrizing the \emph{simulator}'s dynamics \( \mathcal D \equiv \mathcal D_\xi \) with a \emph{dynamics} (random) vector \( \xi \) drawn an arbitrary distribution, \( \xi \sim \Xi \).
-Over the course of training---typically at each episode's reset---a new \( \xi \) is drawn, and used to specify the environment's dynamics for that episode.
-For instance, one could decide to randomize the friction coefficient of the surface in a locomotion task (Figure~\ref{fig:ducks-on-terrains}), or the center of mass of an object for a manipulation task.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\linewidth]{figures/ch3/ch3-many-ducks.png}
-    \caption{The same locomotion task can be carried out in different (simulated) domains (exemplified by the difference in terrains) at training time, resulting to increased robustness over diverse environment dynamics.}
-    \label{fig:ducks-on-terrains}
-\end{figure}
-
-While effective in transfering policies across the reality gap in real-world robotics~\citep{tobinDomainRandomizationTransferring2017,akkayaSolvingRubiksCube2019, jiDribbleBotDynamicLegged2023,tiboniDomainRandomizationEntropy2024}, DR often requires extensive manual engineering.
-First, identifying which parameters to randomize---i.e., the \emph{support} \( \text{supp} (\Xi) \) of \( \Xi \)---is an inherently task specific process.
-When locomoting over different terrains, choosing to randomize the friction coefficient is a reasonable choice, yet not completely resolutive as other factors (lightning conditions, external temperature, joints' fatigue, etc.) may prove just as important, making selecting these parameters yet another source of brittlness.
-
-Selecting the dynamics distribution \( \Xi \) is also non-trivial.
-On the one hand, distributions with low entropy might risk to cause failure at transfer time, due to the limited robustness induced over the course of training.
-On the other hand, excessive randomization may cause over-regularization and hinder performance.
-Consequently, the research community investigated approaches to automatically select the randomization distribution \( \Xi \), using signals from the training process or tuning it to reproduce observed real-world trajectories.
-~\citet{akkayaSolvingRubiksCube2019} use a parametric uniform distribution \( \mathcal U(a, b) \) as \( \Xi \), widening the bounds as training progresses and the agent's performance improves (AutoDR).
-While effective, AutoDR requires significant tuning---the bounds are widened by a fixed, pre-specified amount \( \Delta \)---and may disregard data when performance \emph{does not} improve after a distribution update~\citep{tiboniDomainRandomizationEntropy2024}.
-~\citet{tiboniDomainRandomizationEntropy2024} propose a similar method to AutoDR (DORAEMON) to evolve \( \Xi \) based on training signal, but with the key difference of explicitly maximizing the entropy of parametric Beta distributions, inherently more flexible than uniform distributions.
-DORAEMON proves particularly effective at dynamically increasing the entropy levels of the training distribution by employing a max-entropy objective, under performance constraints formulation.
-Other approaches to automatic DR consist in specifically tuning \( \Xi \) to align as much as possible the simulation and real-world domains.
-For instance, ~\citet{chebotar2019closing} interleave in-simulation policy training with repeated real-world policy rollouts used to adjust \( \Xi \) based on real-world data, while ~\citet{tiboniDROPOSimtoRealTransfer2023} leverage a single, pre-collected set of real-world trajectories and tune \( \Xi \) under a simple likelihood objective.
-
-While DR has shown promise, it does not address the main limitation that, even under the assumption that an ideal distribution \( \Xi \) to sample from was indeed available, many robotics problems \highlight{cannot be simulated with high-enough fidelity under practical computational constraints} in the first place.
-Simulating contact-rich manipulation of possibly deformable or soft materials---i.e., \emph{folding a piece of clothing}---can be costly and even time-intensive, limiting the benefits of in-simulation training.
-
-A perhaps more foundamental limitation of RL for robotics is the general unavailability of complicated tasks' \emph{dense} reward function, the design of which is essentially based on human expertise and trial-and-error.
-In practice, \emph{sparse} reward functions can be used to conclude whether one specific goal has been attained---\emph{has this t-shirt been correctly folded?}---but unfortunately incur in more challenging learning.
-As a result, despite notable successes, deploying RL directly on real-world robots at scale remains challenging.
-
-To make the most of (1) the growing number of openly available datasets and (2) relatively inexpensive robots like the SO-100, RL could (1) be anchored in already-collected trajectories---limiting erratic and dangerous exploration---and (2) train in the real-world directly---bypassing the aforementioned issues with low-fidelity simulations.
-In such a context, sample-efficient learning is also paramount, as training on the real-world is inherently time-bottlenecked.
-
-Off-policy algorithms like Soft Actor-Critic (SAC)~\citep{haarnojaSoftActorCriticOffPolicy2018} tend to be more sample efficient then their on-policy counterpart~\citep{schulmanProximalPolicyOptimization2017}, due to the presence a \emph{replay buffer} used over the course of the training.
-Other than allowing to re-use transitions \( \sars \) over the course of training, the replay buffer can also accomodate for the injection of previously-collected data in the training process~\citep{ballEfficientOnlineReinforcement2023}.
-Using expert demonstrations to guide learning together with learned rewards, RL training can effectively be carried out in the real-world~\citep{luoSERLSoftwareSuite2025}.
-Interestingly, when completed with in-training human interventions, real-world RL agents have been shown to learn policies with near-perfect success rates on challenging manipulation tasks in 1-2 hours~\citep{luoPreciseDexterousRobotic2024}.
-
-% DQN to DDPG to SAC
-\paragraph{Sample-efficient RL}
-In an MDP, the optimal policy \( \pi^* \) can be derived from its associated \qfunction, \( Q_{\pi^*} \), and in particular the optimal action(s) \(\mu(\state)\) can be selected maximizing the optimal \qfunction \ over the action space,
-\[
-\mu(\state) = \max_{\action \in \mathcal A} Q_{\pi^*}(\state, \action).
-\]
-Interestingly, the \qopt-function satisfies a recursive relationship (\emph{Bellman equation}) based on a very natural intuition%
-\footnote{Quote from~\citet{mnihPlayingAtariDeep2013}. The notation used has slightly been adapted for consistency with the rest of this tutorial.}:
-\begin{quote}
-    [...] If the optimal value \( Q^*(\stateplusone, a_{t+1}) \) of the [state] \(\stateplusone \) was known for all possible actions \(a_{t+1} \), then the optimal strategy is to select the action \( a_{t+1}\) maximizing the expected value of \( r_t + \gamma Q^*(s_{t+1}, a_{t+1}) \)
-\[ 
-Q^*(s_t, a_t) = \mathbb E_{s_{t+1} \sim \mathbb P(\bullet \vert s_t, a_t)} \left[ r_t + \gamma \max_{a_{t+1} \in \mathcal A} Q^*(s_{t+1}, a_{t+1}) \big\vert s_t, a_t  \right]
-\]
-\end{quote}
-
-In turn, the optimal \qfunction \ %
-is guaranteed to be self-consistent by definition.
-\emph{Value-iteration} methods exploit this relationship (and/or its state-value counterpart, \( V^*(s_t) \) ) by iteratively updating an initial estimate of \qopt, \( Q_k \) using the Bellman equation as update rule (\emph{Q-learning}):
-\[
-    Q_{i+1}(s_t, a_t) \leftarrow \mathbb E_{s_{t+1} \sim \mathbb P(\bullet \vert s_t, a_t)} \left[ r_t + \gamma \max_{a_{t+1} \in \mathcal A} Q_i (s_{t+1}, a_{t+1}) \big\vert s_t, a_t  \right],  \quad i=0,1,2,\dots,K
-\]
-Then, one can derive the (ideally, near-optimal) policy by explicitly maximizing over the action space the final (ideally, near-optimal) estimate \( Q_K \approx Q^* \) at each timestep. 
-In fact, under certain assumptions on the MDP considered, \( Q_K \to Q^* \, \text{as } K \to \infty \).
-
-Effective in its early applications to small-scale discrete problems and theoretically sound, vanilla Q-learning was found complicated to scale to large \( \statespace \times \actionspace \) problems, in which the storing of \( Q : \statespace \times \actionspace \mapsto \mathbb R \) alone might result prohibitive. 
-Also, vanilla Q-learning is not directly usable for \emph{continuous}, unstructured state-action space MPDs, such as those considered in robotics.
-In their seminal work on \emph{Deep Q-Learning} (DQN),~\citet{mnihPlayingAtariDeep2013} propose learning Q-values using deep convolutional neural networks, thereby accomodating for large and even unstructured \emph{state} spaces.
-DQN parametrizes the Q-function using a neural network with parameters \( \theta \), updating the parameters by sequentially minimizing the expected squared temporal-difference error (TD-error, \( \delta_i \)):
-\begin{align}
-\mathcal L(\theta_i) &= \mathbb E_{(s_t, a_t) \sim \chi(\bullet)} 
-    \big[ 
-        (\underbrace{y_i - Q_{\theta_i}(s_t, a_t)}_{\delta_i})^2 
-    \big], \label{eq:dqn-loss} \\
-    y_i &= \mathbb E_{s_{t+1} \sim \mathbb P(\bullet \vert s_t, a_t)} \big[ r_t + \gamma \max_{\action \in \mathcal A} Q_{\theta_{i-1}} (\stateplusone, a_{t+1}) \big], \label{eq:TD-target}
-\end{align}
-Where \( \chi \) represents a behavior distribution over state-action pairs. 
-Crucially, \( \chi \) can in principle be different from the policy being followed, effectively allowing to reuse prior data stored in a \emph{replay buffer} in the form of \( \sars \) transitions, used to form the TD-target \( y_i \), TD-error \( \delta_i \) and loss function~\ref{eq:dqn-loss} via Monte-Carlo (MC) estimates.
-
-While effective in handling large, unstructured state spaces for discrete action-space problems, DQN application's to continous control problems proved challenging.
-Indeed, in the case of high-capacity function approximators such as neural networks, solving \( \max_{a_t \in \mathcal A} Q_\theta(s_t, a_t) \) at each timestep is simply unfeasible due to the (1) continous nature of the action space (\( \actionspace \subset \mathbb R^n \) for some \( n \)) and (2) impossibility to express the find a cheap (ideally, closed-form) solution to \( Q_\theta \).
-~\citet{silverDeterministicPolicyGradient2014} tackle this fundamental challenge by using a \emph{deterministic} function of the state \( s_t \) as policy, \( \mu_\phi(s_t) = a_t \), parametrized by \( \phi \). Thus, policies can be iteratively refined updating \( \phi \) along the direction:
-\begin{equation}\label{eq:deterministic-pg}
-    d_\phi = \mathbb E_{s_t \sim \mathbb P (\bullet)} \left[ \nabla_\phi Q(s_t, a_t)\vert_{a_t = \mu_\phi(s_t)} \right] = \mathbb E_{s_t \sim \mathbb P(\bullet)} \left[ \nabla_{a_t} Q(s_t, a_t) \vert_{a_t = \mu_\phi(s_t)} \cdot \nabla_\phi \mu(s_t) \right]
-\end{equation}
-Provably, \ref{eq:deterministic-pg} is the \emph{deterministic policy gradient} (DPG) of the policy \(\mu_\phi \)~\citep{silverDeterministicPolicyGradient2014}, so that updates \( \phi_{k+1}\leftarrow \phi_k + \alpha d_\phi \) are guaranteed to increase the (deterministic) cumulative discounted reward, \( J(\mu_\phi) \).
-~\citet{lillicrapContinuousControlDeep2019} extended DPG to the case of (1) high-dimensional unstructured observations and (2) continuous action spaces, introducing Deep Deterministic Policy Gradient (DDPG), an important algorithm RL and its applications to robotics.
-DDPG adopts a modified TD-target compared to the one defined in~\ref{eq:TD-target}, by maintaining a policy network used to select actions, yielding
-\begin{equation}\label{eq:TD-target-ddpg}
-y_i = \mathbb E_{s_{t+1} \sim \mathbb P(\bullet \vert s_t, a_t)} \big[ r_t + \gamma Q_{\theta_{i-1}} (\stateplusone, \mu_\phi(\stateplusone)) \big] .
-\end{equation}
-Similarily to DQN, DDPG also employs the same replay buffer mechanism, to reuse past transitions over training for increased sample efficiency and estimate the loss function via MC-estimates.
-
-Soft Actor-Critic (SAC)~\citep{haarnojaSoftActorCriticOffPolicy2018} is a derivation of DDPG in the max-entropy (MaxEnt) RL framework, in which RL agents are tasked with \highlight{maximizing the discounted cumulative reward, while acting as randomly as possible}.
-MaxEnt RL~\citep{haarnojaReinforcementLearningDeep2017} has proven particularly robust thanks to the development of diverse behaviors, incentivized by its entropy-regularization formulation.
-In that, MaxEnt revisits the RL objective \( J (\pi) \) to specifically account for the policy entropy,
-\begin{align}
-    J(\pi) &= \sum_{t=0}^T \mathbb{E}_{(s_t, a_t) \sim \chi} \left[ r_t + \alpha \mathcal H(\pi (\bullet \vert s_t)) \right] \label{eq:J-soft}
-\end{align}
-This modified objective results in the \emph{soft} TD-target:
-\begin{equation}\label{eq:soft-td-target}
-    y_i = \mathbb E_{s_{t+1} \sim \mathbb P( \bullet \vert s_t, a_t)} \left[ r_t + \gamma \left( Q_{\theta_{i-1}} (\stateplusone, a_{t+1}) - \alpha \log \pi_\phi(a_{t+1} \vert \stateplusone) \right) \right], \quad a_{t+1} \sim \pi_\phi(\bullet \vert s_t)
-\end{equation}
-Similarily to DDPG, SAC also maintains an explicit policy, trained under the same MaxEnt framework for the maximization of \ref{eq:J-soft}, and updated using:
-\begin{equation}\label{eq:sac-policy-update}
-    \pi_{k+1} \leftarrow \arg\min_{\pi^\prime \in \Pi} \DKL \left(\pi^\prime (\bullet \vert \state) \bigg\Vert \frac{\exp(Q_{\pi_k}(s_t, \bullet))}{Z_{\pi_k}(s_t)} \right)
-\end{equation}
-The update rule provided in \ref{eq:sac-policy-update} optimizes the policy while projecting it on a set \( \Pi \) of tractable distributions (e.g., Gaussians,~\citet{haarnojaReinforcementLearningDeep2017}).
-
-% SAC + prior data: RLPD
-\paragraph{Sample-efficient, data-driven RL}
-Importantly, sampling \( \sars \) from the replay buffer \( D \) conveniently allows to approximate the previously introduced expectations for TD-target and TD-error through Monte-Carlo (MC) estimates.
-The replay buffer \( D \) also proves extremely useful in maintaining a history of previous transitions and using it for training, improving on sample efficiency.
-Furthermore, it also naturally provides an entry point to inject offline trajectories recorded, for instance, by a human demonstrator, into the training process.
-
-Reinforcement Learning with Prior Data (RLPD)~\citep{ballEfficientOnlineReinforcement2023} is an Offline-to-Online RL algorithm leveraging prior data to effectively accelerate the training of a SAC agent.
-Unlike previous works on Offline-to-Online RL, RLPD avoids any pre-training and instead uses the available offline data \( D_\text{offline} \) to improve online-learning from scratch.
-During each training step, transitions from both the offline and online replay buffers are sampled in equal proportion, and used in the underlying SAC routine.
-
-% RLPD + reward classifier: SERL
-\paragraph{Sample-efficient, data-driven, real-world RL}
-Despite the possibility to leverage offline data for learning, the effectiveness of real-world RL training is still limited by the need to define a task-specific, hard-to-define reward function.
-Further, even assuming to have access to a well-defined reward function, typical robotics pipelines rely mostly on propioperceptive inputs augmented by camera streams of the environment.
-As such, even well-defined rewards would need to be derived from processed representations of unstructured observations, introducing brittleness.
-In their technical report,~\citet{luoSERLSoftwareSuite2025} empirically address the needs (1) to define a reward function and (2) to use it on image observations, by introducing a series of tools to allow for streamlined training of \emph{reward classifiers} \( c \), as well as jointly learn forward-backward controllers to speed up real-world RL.
-Reward classifiers are particularly useful in treating complex tasks---e.g., folding a t-shirt---for which a precise reward formulation is arbitrarily complex to obtain, or that do require significant shaping and are more easily learned directly from demonstrations of success (\(e^+\)) or failure (\(e^-\)) states, \( s \in \statespace \), with a natural choice for the state-conditioned reward function being \( r \mathcal S \mapsto \mathbb R \) being \( r(s) = \log c(e^+ \ vert s ) \).
-Further,~\citet{luoSERLSoftwareSuite2025} demonstrate the benefits of learning \emph{forward} (executing the task from initial state to completion) and \emph{backward} (resetting the environment to the initial state from completion) controllers, parametrized by separate policies.
-
-Lastly, in order to improve on the robustness of their approach to different goals while maintaing practical scalability,~\citet{luoSERLSoftwareSuite2025} introduced a modified state and action space, expressing proprioperceptive configurations \( q \)  and actions \( \dot q \) in the frame of end-effector pose at \( t=0 \).
-Randomizing the initial pose of the end-effector (\( s_0 \)),\citet{luoSERLSoftwareSuite2025} achieved a similar result to that of having to manually randomize the environment at every timestep, but with the benefit of maintaining the environment in the same condition across multiple training episodes, achieving higher scalability of their method thanks to the increased practicality of their approach.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.8\linewidth]{figures/ch3/ch3-hil-serl-examples.png}
-    \caption{(A) HIL-SERL allows for real-world training of high performance RL agents by building on top advancements presented by of SAC, RLPD and SERL. (B) Example of human intervention during a HIL-SERL training process on a SO-100.}
-    \label{fig:hil-serl-blocks}
-\end{figure}
-
-% SERL + Human in the loop: HIL-SERL
-Building on off-policy deep Q-learning with replay buffers, entropy regularization for better exploration and performance, expert demonstrations to guide learning, and a series of tools and recommendations for real-world training using reward classifiers (Figure~\ref{fig:hil-serl-blocks}),~\citet{luoPreciseDexterousRobotic2024} introduce human interactions during training, learning near-optimal policies in challenging real-world manipulation tasks in 1-2 hours.
-
-Human in the Loop Sample Efficient Robot reinforcement Learning (HIL-SERL)~\citep{luoPreciseDexterousRobotic2024} augments offline-to-online RL with targeted human corrections during training, and employs prior data to (1) train a reward classifier and (2) bootstrap RL training on expert trajectories.
-While demonstrations provide the initial dataset seeding learning and constraining early exploration, interactive corrections allow a human supervisor to intervene on failure modes and supply targeted interventions to aid the learning process.
-Crucially, human interventions are stored in both the offline and online replay buffers, differently from the autonomous transitions generated at training time and stored in the online buffer only.
-Consequently, given an intervention timestep \( k \in (0, T) \), length-\(K\) human intervention data \( \{ s^{\text{human}}_k, a^{\text{human}}_k, r^{\text{human}}_k, s^{\text{human}}_{k+1},\}_{k=1}^K \) is more likely to be sampled for off-policy learning than the data generated online during training, providing stronger supervision to the agent while still allowing for autonomous learning.
-Empirically, HIL-SERL attains near-perfect success rates on diverse manipulation tasks within 1-2 hours of training~\citep{luoPreciseDexterousRobotic2024}, underscoring how offline datasets with online RL can markedly improve stability and data efficiency, and ultimately even allow real-world RL-training.
-
-\subsubsection{Code Example: Real-world RL}
-\textbf{TODO(fracapuano): work out rl training example}
-
-\subsubsection{Limitations of RL in Real-World Robotics: Simulators and Reward Design}
-
-Despite the advancements in real-world RL training, solving robotics training RL agents in the real world still suffers from the following limitations:
-\begin{itemize}
-\item In those instances where real-world training experience is prohibitively expensive to gather~\citep{degraveMagneticControlTokamak2022, bellemareAutonomousNavigationStratospheric2020}, in-simulation training is often the only option. However, high-fidelity simulators for real-world problems can be difficult to build and maintain, especially for contact-rich manipulation and tasks involving deformable or soft materials.
-
-\item Reward design poses an additional source of brittleness. Dense shaping terms are often required to guide exploration in long-horizon problems, but poorly tuned terms can lead to specification gaming or local optima. Sparse rewards avoid shaping but exacerbate credit assignment and slow down learning. In practice, complex behaviors require efforts shaping rewards: a britlle and error prone process.
-\end{itemize}
-
-Advances in Behavioral Cloning (BC) from corpora of human demonstrations address both of these concerns.
-By learning in a supervised fashion to reproduce expert demonstrations, BC methods prove competitive while bypassing the need for simulated environments and hard-to-define reward functions.
diff --git a/app/scripts/latex-to-mdx/input/sections/04_imitation_learning.tex b/app/scripts/latex-to-mdx/input/sections/04_imitation_learning.tex
deleted file mode 100644
index b0c060adf318415b2dbe9537d23f3e75bd85cade..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/04_imitation_learning.tex
+++ /dev/null
@@ -1,505 +0,0 @@
-\section{Robot (Imitation) Learning}
-\label{sec:robot-imitation-learning}
-
-\epigraph{\emph{The best material model for a cat is another, or preferably the same cat}}{Norbert Wiener}
-
-\begin{tldr}
-Behavioral Cloning provides a natural platform to learn from real-world interactions without the need to design any reward function, and generative models prove more effective than point-wise policies at dealing with multimodal demonstration datasets.
-\end{tldr}
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.8\textwidth]{figures/ch4/ch4-bc-trajectories.png}
-    \caption{(A) Average (with standard deviation) evolution of the actuation levels over the first 5 recorded episodes in \url{lerobot/svla_so101_pickplace}. Proprioperceptive state provide invaluable to determine the robot's state during an episode. (B) Camera frames are also recorded alongside measurements on the robot's state, capturing information about the robot's interaction with its environment.}
-    \label{fig:ch4-bc-trajectories}
-\end{figure}
-
-Learning from human demonstrations provides a pragmatic alternative to the reinforcement-learning pipeline discussed in Section~\ref{sec:learning-rl}.
-Indeed, in real-world robotics online exploration is typically \highlight{costly and potentially unsafe}, and designing (dense) reward signals is a \highlight{brittle and task-specific} process.
-In general, success detection itself may often require bespoke instrumentation, while episodic training demands reliable resets---all factors complicating training RL algorithms on hardware at scale.
-Behavioral Cloning (BC) sidesteps these constraints by casting control an imitation learning problem, leveraging previously collected expert demonstrations.
-Most notably, by learning to imitate autonomous systems naturally adhere to the objectives, preferences, and success criteria implicitly encoded in the data, which obviates reduces early-stage exploratory failures and obviates hand-crafted reward shaping altogether.
-
-Formally, let \( \mathcal D = \{ \tau^{(i)} \}_{i=1}^N \) be a set of expert trajectories, with \( \tau^{(i)} = \{(o_t^{(i)}, a_t^{(i)})\}_{t=0}^{T_i} \) representing the \(i\)-th trajectory in \( \mathcal D \), \(o_t \in \obsspace \) denoting observations (e.g., images and proprioception altogether), and \(a_t \in \actionspace \) the expert actions.
-Typically, observations \( o \in \obsspace \) consist of both image and proprioperceptive information, while actions \( a \in \actionspace \) represent control specifications for the robot to execute, e.g. a joint configuration.
-Note that differently from Section~\ref{sec:learning-rl}, in the imitation learning context \( \mathcal D \) denotes an offline dataset collecting \( N \) length-\( T_i \) reward-free (expert) human trajectories \( \tau^{(i)} \), and \emph{not} the environment dynamics.
-Similarily, in this section \( \tau^{(i)} \) represent a length-\(T_i\) trajectory of observation-action pairs, which crucially \emph{omits entirely any reward} information.
-Figure~\ref{fig:ch4-bc-trajectories} graphically shows trajectories in terms of the average evolution of the actuation on the 6 joints over a group of teleoperated episodes for the SO-100 manipulator.
-Notice how proprioperceptive states are captured jointly with camera frames over the course of the recorded episodes, providing a unified high-frame rate collection of teleoperation data.
-Figure~\ref{fig:ch4-observation-action-mapping} shows \( (o_t, a_t) \)-pairs for the same dataset, with the actions performed by the human expert illustrated just alongside the corresponding observation.
-In principle, (expert) trajectories \( \tau^{(i)} \) can have different lengths since demonstrations might exhibit multi-modal strategies to attain the same goal, resulting in possibly multiple, different behaviors.
-
-% NOTE(fracapuano): maybe change RL notation to T for transitions and eta for trajectories to avoid specifying difference of notation in here?
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch4/ch4-observation-action-mapping.png}
-    \caption{Sample observations and action pairs over the course of a given trajectory recorded in \url{lerobot/svla_so101_pickplace}. Observations, comprising of both proprioperceptive and visual information, are recorded alongside the configuration of a second, leader robot controlled by a human expert, providing complete information for regressing actions given observation.}
-    \label{fig:ch4-observation-action-mapping}
-\end{figure}
-
-Behavioral Cloning (BC)~\citep{pomerleauALVINNAutonomousLand1988a} aims at synthetizing synthetic behaviors by learning the mapping from observations to actions, and in its most natural formulation can be effectively tackled as a \emph{supevised} learning problem, consisting of learning the (deterministic) mapping \(f: \obsspace \mapsto \actionspace, \ a_t = f(o_t) \) by solving
-\begin{equation}\label{eq:loss-minimization-SL}
-    \min_{f} \mathbb{E}_{(o_t, a_t) \sim p(\bullet)} \mathcal L(a_t, f(o_t)),
-\end{equation}
-for a given risk function \( \mathcal L:  \mathcal A \times \mathcal A \mapsto \mathbb{R}, \ \mathcal L (a, a^\prime) \).
-
-Typically, the expert's joint observation-action distribution \( p: \obsspace \times \actionspace \mapsto [0,1] \) such that \( (o,a) \sim p(\bullet) \) is assumed to be unknown, in keeping with a classic Supervised Learning (SL) framework\footnote{Throughout, we will adopt the terminology and notation for SL introduced in~\citet{shalev-shwartzUnderstandingMachineLearning2014}}.
-However, differently from standard SL's assumptions, the samples collected in \( \mathcal D \), correspoding to observations of the underlying \( p \) are \emph{not} i.i.d., as expert demonstrations are collected \emph{sequentially} in trajectories.
-In practice, this aspect can be partially mitigated by considering pairs in a non-sequential order---\emph{shuffling} the samples in \(\mathcal D \)---so that the expected risk under \( p \) can be approximated using MC estimates, although estimates may in general be less accurate.
-Another strategy to mitigate the impact of regressing over non-i.i.d. samples relies on the possibility of interleaving BC and data collection~\citep{rossReductionImitationLearning2011}, aggregating multiple datasets iteratively.
-However, because we only consider the case where a single offline dataset \( \mathcal D \) of (expert) trajectories is already available, dataset aggregation falls out of scope.
-
-Despite the inherent challenges of learning on non-i.i.d. data, the BC formulation affords several operational advantages in robotics.
-First, training happens offline and typically uses expert human demonstration data, hereby severily limiting exploration risks by preventing the robot from performing dangerous actions altogether.
-Second, reward design is entirely unnecessary in BC, as demonstrations already reflect human intent and task completion.
-This also mitigates the risk of misalignment and specification gaming (\emph{reward hacking}), otherwise inherent in purely reward-based RL~\citep{heessEmergenceLocomotionBehaviours2017}.
-Third, because expert trajectories encode terminal conditions, success detection and resets are implicit in the dataset.
-Finally, BC scales naturally with growing corpora of demonstrations collected across tasks, embodiments, and environments.
-However, BC can in principle only learn behaviors that are, at most, as good as the one exhibited by the demonstrator, and thus critically provides no mitigation for the suboptimal decision making that might be enaced by humans.
-Still, while problematic in sequential-decision making problems for which expert demonstrations are not generally available---data migth be expensive to collect, or human performance may be inherently suboptimal---many robotics applications benefit from relative cheap pipelines to acquire high-quality trajectories generated by humans, thus justifying BC approaches.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.8\textwidth]{figures/ch4/ch4-issues-with-bc.png}
-    \caption{Point-wise policies suffer from limitations due to (A) covariate shifts and poor approximation of (B) multimodal demonstrations. (A) Initially small errors may drive the policy out of distribution, incuring in a vicious circle ultimately resulting in failure. (B) Both modes of reaching for a target object in a scene, either left or right-first, are equally as good and thus equally as likely to be present in a dataset of human demonstrations, ultimately resulting in multimodal demonstrations.}
-    \label{fig:ch4-issues-with-bc}
-\end{figure}
-
-While conceptually elegant, point-estimate policies \( f : \obsspace \mapsto \actionspace \) learned by solving \ref{eq:loss-minimization-SL} have been observed to suffer from (1) compounding errors~\citep{rossReductionImitationLearning2011} and (2) poor fit to multimodal distributions~\citep{florenceImplicitBehavioralCloning2022, keGraspingChopsticksCombating2020}.
-Figure~\ref{fig:ch4-issues-with-bc} illustrates these two key issues related to learning \emph{explicit policies}~\citep{florenceImplicitBehavioralCloning2022}.
-Besides sequentiality in \( \mathcal D \), compounding errors due to \emph{covariate shift} may also prove catastrophic, as even small \( \epsilon \)-prediction errors \( 0 < \Vert \mu(o_t) - a_t \Vert \leq \epsilon \) can quickly drive the policy into out-of-distribution states, incuring in less confident generations and thus errors compounding (Figure~\ref{fig:ch4-issues-with-bc}, left).Moreover, point-estimate policies typically fail to learn \emph{multimodal} targets, which are very common in human demonstrations solving robotics problems, since multiple trajectories can be equally as good towards the accomplishment of a goal (e.g., symmetric grasps, Figure~\ref{fig:ch4-issues-with-bc}, right).
-In particular, unimodal regressors tend to average across modes, yielding indecisive or even unsafe commands~\citep{florenceImplicitBehavioralCloning2022}.
-To address poor multimodal fitting,~\citet{florenceImplicitBehavioralCloning2022} propose learning the generative model \( p(o, a) \) underlying the samples in \( \mathcal D \), rather than an explicitly learning a prediction function \( f(o) = a \).
-
-\subsection{A (Concise) Introduction to Generative Models}
-% Generative Modeling
-Generative Models (GMs) aim to learn the stochastic process underlying the very generation of the data collected, and typically do so by fitting a probability distribution that approximates the unknown \emph{data distribution}, \( p \).
-In the case of BC, this unknown data distribution \( p \) represents the expert's joint distribution over \( (o, a) \)-pairs.
-Thus, given a finite set of \( N \) pairs \(\mathcal D = \{ (o,a)_i \}_{i=0}^N\) used as an imitation learning target (and thus assumed to be i.i.d.), GM seeks to learn a \emph{parametric} distribution \( p_\theta(o,a) \) such that (1) new samples \( (o,a) \sim p_\theta(\bullet) \) resemble those stored in \( \mathcal D \), and (2) high likelihood is assigned to the observed regions of the unobservable \( p \).
-Likelihood-based learning provides a principled training objective to achieve both objectives, and it is thus extensively used in GM~\citep{prince2023understanding}.
-
-% VAEs
-\subsubsection{Variational Auto-Encoders}
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.8\textwidth]{figures/ch4/ch4-task-effect-on-pairs.png}
-    \caption{Intuitively, latent variable in a single latent model may contain information regarding the task being performed, which directly results in the likelihood of the same observation-action pair being different for two different tasks. When (A) picking a block the likelihood of a wide gripper's opening should be higher than narrower one, while it should be the opposite when (B) pushing the block.}
-    \label{fig:ch4-task-effect-on-pairs}
-\end{figure}
-
-A common inductive bias used in GM posits samples \( (o,a) \) are influenced from an unobservable latent variable \( z \in Z \), resulting in
-\begin{equation}\label{eq:BC-latent-variable}
-    p (o,a) = \int_{\supp{Z}} p(o,a \vert z) p(z)
-\end{equation}
-Intuitively, in the case of observation-action pairs \( (o, a) \) for a robotics application, \( z \) could be some high level representation of the underlying task being performed by the human demonstrator.
-In such case, treating \( p(o,a) \) as a marginalization over \( \supp{Z} \) of the complete joint distribution \( p(o,a,z) \) natively captures the effect different tasks have on the likelihood of observation-action pairs.
-Figure~\ref{fig:ch4-task-effect-on-pairs} graphically illustrates this concept in the case of a (A) picking and (B) pushing task, for which, nearing the target object, the likelihood of actions resulting in opening the gripper---the higher \( q_6 \), the wider the gripper's opening---should intuitively be (A) high or (B) low, depending on the task performed.
-While the latent space \( Z \) typically has a much richer structure than the set of all actual tasks performed,~\ref{eq:BC-latent-variable} still provides a solid framework to learn joint distribution conditioned on unobservable yet relevant factors.
-Figure~\ref{fig:ch4-latent-variable-model} represents this framework of latent-variable for a robotics application: the true, \( z \)-conditioned generative process on assigns \emph{likelihood} \( p((o,a) \vert z) \) to the single \( (o,a) \)-pair.
-Using Bayes' theorem, one can reconstruct the \emph{posterior} distribution on \( \supp{Z} \), \( q_\theta(z \vert o,a) \) from the likelihood \( p_\theta(o,a \vert z) \), \emph{prior} \( p_\theta(z) \) and \emph{evidence} \( p_\theta(o,a) \).
-VAEs approximate the latent variable model presented in~\ref{eq:BC-latent-variable}) using an \emph{approximate posterior} \(q_\phi(z \vert o,a) \) while regressing parameters for a parametric likelihood, \( p_\theta(o,a \vert z) \) (Figure~\ref{fig:ch4-latent-variable-model}).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch4/ch4-latent-variable-model.png}
-    \caption{(A) The latent variable model in a robotics application regulates influence between observed (\(o,a) \) variables and an unobservable latent variable. (B) VAEs approximate exact latent variable models by means of variational inference. }
-    \label{fig:ch4-latent-variable-model}
-\end{figure}
-
-Given a dataset \( \mathcal D \) consisting of \( N \) i.i.d. observation-action pairs, the log-likelihood of all datapoints under \( \theta \) (in Bayesian terms, the \emph{evidence} \( p_\theta(\mathcal D)\)) can thus be written as:
-\begin{align}
-    \log p_\theta(\mathcal D) &= \log \sum_{i=0}^N p_\theta ((o,a)_i) \label{eq:evidence-definition-1}\\
-                              &= \log \sum_{i=0}^N \int_{\supp{Z}} p_\theta((o,a)_i \vert z) p(z) \label{eq:evidence-definition-2}\\
-                              &= \log \sum_{i=0}^N \int_{\supp{Z}} \frac{q_\theta(z \vert (o,a)_i)}{q_\theta(z \vert (o,a)_i)} \cdot p_\theta((o,a)_i \vert z) p(z) \label{eq:evidence-definition-3}\\
-                              &= \log \sum_{i=0}^N \mathbb E_{z \sim p_\theta(\bullet \vert (o,a)_i)} \left[ \frac{p(z)}{q_\theta(z \vert (o,a)_i)} \cdot p_\theta((o,a)_i \vert z) \right], \label{eq:evidence-definition}
-\end{align}
-where we used~\ref{eq:BC-latent-variable} in~\ref{eq:evidence-definition-1}, multiplied by \(1 = \frac{q_\theta(z \vert (o,a)_i)}{q_\theta(z \vert (o,a)_i)} \) in~\ref{eq:evidence-definition-2}, and used the definition of expected value in~\ref{eq:evidence-definition}.
-
-In the special case where one assumes distributions to be tractable, \( p_\theta (\mathcal D) \) is typically tractable too, and \(\max_\theta \log p_\theta(\mathcal D) \) provides a natural target for (point-wise) infering the unknown parameters \( \theta \) of the generative model.
-Unfortunately,~\ref{eq:evidence-definition} is rarely tractable when the distribution \( p \) is modeled with approximators such as neural networks, especially for high-dimensional, unstructured data.
-
-In their seminal work on Variational Auto-Encoders (VAEs),~\citet{kingmaAutoEncodingVariationalBayes2022} present two major contributions to learn complex latent-variable GMs on unstructured data, proposing (1) a tractable, variational lower-bound to \ref{eq:evidence-definition} as an optimization target to jointly learn likelihood and posterior and (2) high-capacity function approximators to model the likelihood \(p_\theta(o,a\vert z)\) and (approximate) posterior distribution \( q_\phi(z \vert o,a) \approx q_\theta(z \vert o,a) \).
-
-In particular, the lower bound on~\ref{eq:evidence-definition} (Evidence LOwer Bound, \emph{ELBO}) can be derived from~\ref{eq:evidence-definition} applying Jensen's inequality---\(\log \mathbb{E}[\bullet] \geq \mathbb{E} [\log (\bullet)] \)---yielding:
-\begin{align}
-    \log p_\theta(\mathcal D) &\geq \sum_{i=0}^{N} \left(
-            \mathbb{E}_{z \sim p_\theta(\cdot \vert (o,a)_i)} \big[ \log p_\theta((o,a)_i \vert z) \big]
-            + \mathbb{E}_{z \sim p_\theta(\cdot \vert (o,a)_i)} \left[ \log \left( \frac{p(z)}{q_\theta(z \vert (o,a)_i)} \right) \right]
-        \right) \\
-        &= \sum_{i=0}^{N} \left(
-            \mathbb{E}_{z \sim p_\theta(\cdot \vert (o,a)_i)} \big[ \log p_\theta((o,a)_i \vert z) \big]
-        - \DKL \big[ q_\theta(z \vert (o,a)_i) \Vert p(z) \big]
-        \right) \label{eq:ELBO-intractable}
-\end{align}
-The true, generally intractable posterior \( p_\theta (z \vert o,a) \) prevents computing both the expectation and KL divergence terms in~\ref{eq:ELBO-intractable}, and therefore~\citet{kingmaAutoEncodingVariationalBayes2022} propose deriving the ELBO using an \emph{approximate} posterior \( q_\phi(z \vert o,a) \), resulting in the final, tractable ELBO objective,
-\begin{align}
-\text{ELBO}_{\mathcal D}(\theta, \phi) = \sum_{i=0}^{N} \left(
-            \mathbb{E}_{z \sim q_\phi(\cdot \vert (o,a)_i)} \big[ \log p_\theta((o,a)_i \vert z) \big]
-        - \DKL \big[ q_\phi(z \vert (o,a)_i) \Vert p(z) \big]
-        \right)
-        \label{eq:ELBO}
-\end{align}
-From Jensen's inequality, maximizing ELBO results in maximizing the log-likelihood of the data too, thus providing a natural, tractable optimization target.
-Indeed, expectations can be estimated using MC estimates from the learned distributions in~\ref{eq:ELBO}, while the KL-divergence term can typically be computed in closed-form (1) modeling  \(q_\phi \) as a Gaussian \(q_\phi(z \vert o,a) = \mathcal N\big(\mu_\phi(o,a), \Sigma_\phi(o,a) \big) \) and (2) imposing a standard Gaussian prior on the latent space, \( p(z) = \mathcal N(\mathbf{0}, \mathbf{I}) \).
-
-An intuitive explanation of the learning dynamics of VAEs can be given considering the equivalent case of \emph{minimizing the negative ELBO}, which admits a particularly interpretable factorization
-
-\begin{align}
-\min_{\theta, \phi} - \text{ELBO}_{\mathcal (o,a) \sim \mathcal D}(\theta, \phi) &= \min_{\theta, \phi}\mathbf{L^{\text{rec}}}(\theta) + \mathbf{L^{\text{reg}}}(\phi) \label{eq:VAE-min-neg-ELBO}\\
-\mathbf{L^{\text{rec}}}(\theta) &= \mathbb{E}_{z \sim q_\phi(\cdot \vert o,a} \big[ \log p_\theta(o,a \vert z) \big] \label{eq:VAE-Lrec} \\
-\mathbf{L^{\text{reg}}}(\phi) &= \DKL \big[ q_\phi(z \vert o,a) \Vert p(z) \big] \label{eq:VAE-Lreg}
-\end{align}
-
-For any given \((o,a) \) pair, the expected value term of~\ref{eq:VAE-Lrec} is typically computed via MC estimates, resulting in
-\[ 
--\mathbb{E}_{z \sim q_\phi(\bullet \vert o,a)} \big[ \log p_\theta(o,a \vert z) \big] = \mathbf{L^{\text{rec}}} \approx - \frac{1}{n} \sum_{i=0}^n \log p_\theta(o,a \vert z_i).
-\]
-Assuming \( p_\theta(o,a \vert z) \) is parametrized as an isotropic Gaussian distribution with mean \(\mu_\theta (z) \in \mathbb R^d \) and variance \( \sigma^2 \), the log-likelihood thus simplifies to:
-\[
-\log p(o,a \vert z_i) = -\frac{1}{2\sigma^{2}} \big \Vert (o,a)-\mu_\theta(z_i) \big\Vert_2^2 -\frac{d}{2}\log(2\pi \sigma^{2}) \implies \mathbf{L^\text{rec}} \approx \frac {1}{n} \sum_{i=0}^n \big\Vert (o,a) - \mu_\theta(z_i) \big \Vert^2_2
-\]
-Indeed, it is very common in practice to approximate from the learned likelihood \( p_\theta(o,a \vert z) \) as a parametric distribution (e.g. Gaussians) parametrized by some learned vector of coefficients derived from \(\mu_\theta (z), \ z \sim p (\bullet) \).
-In all such cases, learning a VAE corresponds to optimally \emph{reconstructing} the examples in \( \mathcal D \) by minimizing the L2-error---a very common \emph{supervised learning} objective for regression targets---while regularizing the information compression into the latent, as under the common modeling choice \( p(z) = \mathcal N (\mathbf{0}, \mathbf{I}) \)~\ref{eq:VAE-Lreg} regularizes the posterior limiting the expressivity of \( q_\phi(z\vert o,a) \).
-
-% Diffusion
-\subsubsection{Diffusion Models}
-VAEs approximate probability distributions via a \emph{single} latent variable model, assuming the underlying unknown distribution can be factored according to~\ref{eq:BC-latent-variable}, and solve the variational inference problem of jointly learning the likelihood \( p_\theta \) and (approximate) posterior \( q_\phi \) for such model.
-In that, the unknown data distribution \( p(o,a) \) is effectively approximated via \( \int_Z p(z) p_\theta(o,a \vert z) \), and the underlying generative process reproduced by (1) sampling a latent variable and (2) learning to decode it into a (ideally) high-likelihood sample under the (unknown) \( p(o,a) \).
-Diffusion Models (DMs)~\citep{hoDenoisingDiffusionProbabilistic2020} are another class of GMs which treat the similar problem of approximating an underlying unknown data distribution---\emph{variational inference}---by \emph{partially} extending VAEs to the case where \emph{multiple} latent variables influence each other and the generative process underlying \(o,a\) itself.
-In particular, DMs posit the generative process can be decomposed to a series of piece-wise (Markovian) interactions between (latent) variables (Figure~\ref{fig:ch4-many-latents}), resulting in
-\begin{align}
-    p(\underbrace{o,a}_{= z_0}) &= \int_{\supp{Z_0}} \int_{\supp{Z_1}} \hdots \int_{\supp{Z_T}} p(z_0, z_1, \dots z_T) \label{eq:BC-multi-latent-model-1} \\ 
-    p(z_0, z_1, \dots z_T) &= p(z_T) \prod_{t=0}^{T} p(z_{t-1} \vert z_t), \label{eq:BC-multi-latent-model-2}
-\end{align}
-where we explicitly showed the marginalization over the multiple latents in~\ref{eq:BC-multi-latent-model-1}, and used the law of conditional probability and Markov property in~\ref{eq:BC-multi-latent-model-2}.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.5\textwidth]{figures/ch4/ch4-many-latents.png}
-    \caption{HMLV models posit the data generation process is influenced by a stack of Markov-dependent latent variables, with samples from the posterior distribution being progressively higher up in the hierarchy.}
-    \label{fig:ch4-many-latents}
-\end{figure}
-
-Similarily to VAEs, providing an exact interpretation for the latent variables is typically not possible.
-Still, one fairly reasonable application-driven intuition is that, by providing a model of the hierarchical, decoupled interaction of latent variables, Hierarchical Markov Latent Variable (HMLV) models attempt to capture the different resolutions at which different conditioning factors intervene, so that in a robotics application for instance, one could naturally distinguish between early-stage trajectory planning (\( t \to T\)) and fine-grained adjustments (\( t \to 0 \)).
-In that, HMLV models thus provide a framework to perform variational inference via multiple, sequential sampling steps from different higher level distributions instead of approximating the generative process with a single-latent variable model.
-DMs are a particular instantiation of HMLV models for which the posterior \( q( z_t \vert z_{t-1}) = \mathcal N(z_t \sqrt{1-\beta_t}, \beta_t \mathbf{I}) \) for a given \( \beta_t \in \mathbb R^+ \), thereby iteratively reducing the signal-to-noise ratio as \( \beta_t \) increases along the latents hierarchy.
-
-Just like VAEs, DMs attemp to learn to reproduce an underlying data distribution \( p (o,a) \) given a collection of i.i.d. samples approximating the model posited to have generated the data in the first place (~\ref{eq:BC-multi-latent-model-1}).
-Similarily to VAEs, DMs approximate the process of sampling from the unknown \( p(o,a) \) (1) sampling from an easy-to-sample distribution (e.g., Gaussian) and (2) learning to reconstruct high-likelihood samples under the unknown distribution.
-However, in stark contrast with VAEs, the easy-to-sample distribution contains \emph{no mutual information} regarding the data distribution \( p(o,a) \).
-Crucially, as no information from the sample \( (o,a) \) (denoted as \( z_0 \equiv (o,a) \) for the sake of notation) is assumed to be propagated throughout the chain of latents, the posterior \( q(z_t \vert z_{t-1})\) assumes a relatively amicable structure in DMs, reducing complexity.
-The \emph{true} likelihood \( p(z_{t-1} \vert z_t) \) is instead typically approximated using the parametrization \(  p_\theta (z_{t-1} \vert z_t) \).
-In that, the information contained in the unknwon data distribution is \emph{reconstructed} via a process in which samples from a fixed distribution are turned into (ideally) high-likelihood samples under \( p(o,a) \)---a process referred to as \emph{denoising}.
-
-Under such model, we can express the log-likelihood of an arbitrary sample as\footnote{\( o,a = z_0 \) for the sake of notation. Steps omitted for brevity. See Section A in~\citet{hoDenoisingDiffusionProbabilistic2020} for a complete derivation.
-% See~\ref{appendix:diffusion-derivation}
-}
-\begin{align}
-    \log p_\theta (\underbrace{o,a}_{= z_0}) = 
-    &\mathbb{E}_{z_1 \sim q(\bullet \vert z_0)} \log p_\theta (z_0 \vert z_1) - \label{eq:diffusion-likelihood} \\
-    &\mathbb{E}_{z_{T-1} \sim q(\bullet \vert z_0)} \big[ \DKL (q(z_T \vert z_{T-1}) \Vert p(z_T) ) \big] - \notag \\ 
-    &\sum_{t=1}^{T-1} \mathbb{E}_{(z_{t-1}, z_{t+1}) \sim q(\bullet \vert z_0)} \big[ \DKL (q(z_t \vert z_{t-1}) \Vert p_\theta(z_t \vert z_{t-1}) ) \big], \notag
-\end{align}
-providing an optimization target in the form of \( \max_\theta \log p_\theta (\mathcal D) \).
-
-In their seminal work on using DMs for variational inference,~\citet{hoDenoisingDiffusionProbabilistic2020} introduce major contributions regarding solving \( \min_\theta -\log p_\theta(o,a) \).
-In particular,~\citet{hoDenoisingDiffusionProbabilistic2020} exclusively adopt a fixed \emph{Gaussian} posterior in the form of \( q(z_t \vert z_{t-1}) = \mathcal{N}(\sqrt{1-\beta_t}z_{t-1}, \beta_t \mathbf I) \).
-The choice of adopting Gaussians has profound implications on the generative process modeled. 
-Indeed, under the (mild) assumption that the variance is sufficiently small \( \beta_t \leq \eta, \eta \in \mathbb R^+ \),~\citet{sohl-dicksteinDeepUnsupervisedLearning2015} proved that the likelihood \( p(z_{t-1} \vert z_t) \) is Gaussian as well, which allows for the particularly convenient parametrization of the approximate likelihood \( p_\theta (x_{t-1} \vert x_t) = \mathcal N(\mu_\theta(x_t, t), \Sigma_\theta(x_t,t)), \ t \in [1,T] \), as well as for closed-form tractability of the KL-divergence terms in~\ref{eq:diffusion-likelihood}.
-Further, the posterior's structure also enables an analytical description for the distribution of the \( t\)-th latent variable, \( q(z_t \vert z_0) = \mathcal N (\sqrt{\bar{\alpha}_t}z_0, (1-\bar{\alpha}_t) \mathbf{I}) \), with \( \alpha_t = 1-\beta_t, \ \bar \alpha_t = \prod_{k=1}^t \alpha_k \), which conveniently prevents iterative posterior sampling.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch4/ch4-diffusion-robot-actions.png}
-    \caption{DMs iteratively corrupt samples (left) from an unknown distribution into a quasi-standard Gaussian (center), learning the displacement field (right) that permits to reconstruct samples from the unknown target distribution by iteratively denoising samples of a tractable, easy-to-sample distribution.}
-    \label{fig:diffusion-robot-actions}
-\end{figure}
-
-Finally, adopting Gaussian posteriors permits a particularly pleasing interpretation of the dynamics of training DMs~\citep{permenterInterpretingImprovingDiffusion2024}.
-By using Gaussian posteriors, the hierarchical latent variables effectively lose increasingly more information circa the original (unknown) distribution's sample, \( z_0 \), increasingly distributing according to a standard Gaussian and thus containing no information at all (Figure~\ref{fig:diffusion-robot-actions}).
-Figure~\ref{fig:diffusion-robot-actions} illustrates this procedure on a simplified, bidimensional observation-action distribution, where we considered \( o=q_2\) and \( a=q^h_2\), with \( q_2 \) representing the robot's \emph{elbow flex} actuation and \( q^h_2 \) the human teleoperator's robot elbow flex.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.3\textwidth]{figures/ch4/ch4-action-vs-observation-distribution.png}
-    \caption{A joint action-observation distribution, in the simplified case where the observation is the elbow-flex actuation in a SO-100, and the action is the recorded position for the same joint in the teleoperator arm. The motion recorded being teleoperated, the points distribute along a the diagonal.}
-    \label{fig:ch4-action-vs-observation-distribution}
-\end{figure}
-
-Because the recorded behavior is teleoperated, measurements mostly distribute along the line \( a = o + \eta, \eta \sim N(0,1) \), with \( \eta \)-variability accouting for minor control inconsistencies (Figure~\ref{fig:ch4-action-vs-observation-distribution}).
-Using Gaussian posteriors---i.e., adding Gaussian noise---effectively simulates a \emph{Brownian motion} for the elements in the distribution's support (in Figure~\ref{fig:diffusion-robot-actions}, \( \obsspace \times \actionspace \)), whereby information \emph{diffuses away} from the samples, and comparing the diffused samples to the original data points one can derive an estimate of the total displacement induced by diffusion.
-Under the only assumption that the likelihood of the diffused samples is low under the original unknown data distribution, then one can effectively approximate the unkwown distribution by learning to \emph{reverse} such displacement.
-This key intuition allows to write a simplified training objective:
-\begin{align}\label{eq:diffusion-simplified-loss}
-    \mathcal L(\theta) = \mathbb{E}_{t, z_0, \epsilon} \big[
-        \Vert \epsilon - \epsilon_\theta(\sqrt{\bar \alpha_t} z_0 + \epsilon \sqrt{1 - \bar \alpha_t}, t) \Vert^2 \big], \quad t \sim \mathcal{U}(\{1,\dots,T\}), \quad
-        z_0 \sim \mathcal{D}, \quad
-        \epsilon \sim \mathcal{N}(\mathbf{0},\mathbf{I}).
-\end{align}
-
-In this simplified (minimization) objective, the optimization process differs from~\ref{eq:diffusion-likelihood} in that, rather than maxizing \( p_\theta \) directly, the parameters \( \theta \) of the pairwise likelihood \( p_\theta(z_{t-1} \vert z_t) \) are adjusted to \emph{predict the total displacement} \( \epsilon \) for a randomly long (\( t \sim \mathcal{U}(\{1,\dots,T\} \) )) diffusion process starting from a sample of the target distribution.
-
-By learning the total displacement from a generally, uninformative corrupted sample obtained diffusing information and a sample from an unknown distribution---significant (\( \Vert \epsilon \Vert > 0 \)) whenever input and target distribution are sufficiently different---~\citet{hoDenoisingDiffusionProbabilistic2020} show that one can approximate the underlying distribution reversing the displacement, \emph{denoising} samples.
-Interestingly, under the hypothesis real-world data belongs to a single higher dimensional manifold (Manifold Hypothesis),~\citet{permenterInterpretingImprovingDiffusion2024} show that diffusion learns the gradient of a distance function from any off-point manifold (such as perturbed, uniformative samples), and the data manifold itself.
-Following this gradient---i.e., denoising a sample from an uninformative distribution---corresponds to projecting back into the manifold, yielding a procedure to sample from unknown distributions by means of Euclidean projection.
-Indeed, under the assumption that \(p_\theta (z_{t-1} \vert z_t) \) is Gaussian, then sampling \(z_{t-1} \sim p_\theta(\bullet \vert z_{t}) \) corresponds to computing
-\begin{align}
-    z_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( z_t - \frac{\beta_t}{\sqrt{1 - \bar\alpha_t}} \epsilon_\theta(z_t, t) \right) + \sigma_t \epsilon, \quad \epsilon \sim \mathcal N(\mathbf{0}, \mathbf{I}), \label{eq:diffusion-denoising-definition}
-\end{align}
-thus showing that the lower-level latent variables in a DM can be obtained by iteratively removing noise from the one-step higher order variable, using the noise regressor \( \epsilon_\theta(z_t, t)\) learned minimizing~\ref{eq:diffusion-simplified-loss}.
-
-\subsubsection{Flow Matching}
-\label{sec:ch4-flow-matching}
-% Note: I have purposefuly avoided redefining likelihood and optimization objective for FM models as (1) I am presenting them as a generalization of DM, for which the same concepts have been presented in enough detail and (2) the derivation of the objective in the FM paper is very clear and does not really need explanation or coverage.
- 
-The posterior parametrization adopted by DMs proved traditionally effective, yet it raised concerns circa its efficiency at inference time, where a possibly large of compute-expensive denoising steps are needed in order to recover a sample from the target distribution. 
-Flow Matching (FM)~\citep{lipmanFlowMatchingGenerative2023} extends DMs to the general case of arbitrary, parametrized likelihood and posteriors, and in this defines a superseding class of GMs providing a unified framework for learning \emph{continuous transformations} between distributions, encompassing and generalizing DMs.
-Instead of a \emph{stochastic, discrete, multi-step} denoising process, FM aims to learn a \emph{deterministic, continuous, differentiable flow} \( \psi [0,1] \times Z \mapsto Z \), formalized starting from possibly time-dependent vector field \( v: [0,1] \times Z \mapsto Z \) transporting samples from a simple prior distribution \( p_0 \)---e.g., a standard Gaussian---to a more complex, potentially unknown data distribution \( p_1 \) over time.
-Note how FM models time \( t \in [0,1] \) to be varying continuously while moving away \emph{from} an easy-to-sample distribution \( p_0 \) \emph{towards} the unknown data-distribution, \( p_1 \). 
-This results in a continuous and deterministic trajectory for each sample, which can be more efficient to generate compared to the stochastic paths of DMs.
-Formally, FM can be fully characterized by an ordinary differential equation (ODE) relating instantaneous variations of flows with the underlying vector field, and hence providing complete trajectories over the distributions' support when integrating over time,
-\begin{align}
-    \frac{d}{dt} \psi(z, t) &= v(t, \psi(t, z)) \\
-    \psi(0, z) &= z
-\end{align}
-
-FM proved very effective in a variety of applications, ranging from image~\citep{esserScalingRectifiedFlow2024} and video generation~\citep{polyakMovieGenCast2025} to robotics control~\citep{black$p_0$VisionLanguageActionFlow2024}.
-Most notably, in their introductory work on FM for GM,~\citet{lipmanFlowMatchingGenerative2023} show how DMs can be seen as a specific instance of FM where the \emph{conditional} target vector field \( u \) approximated by the noise regressor corresponds to
-\begin{equation}\label{eq:fm-diffusion-vector-field}
-    u(t, z\vert z_0) = \frac{\frac{d}{dt}\alpha(1-t)}{1 - (\alpha(1-t))^2}(\alpha(1-t)z - z_0), \quad \alpha(t) = e^{-\frac12 \int_0^t \beta(s) ds}, \quad \forall z_0 \in \mathcal D
-\end{equation}
-Note that the traditional discrete-time noise-scheduler \( {\beta_t}_{t=0}^T \) is now generalized to a continuous map \( \beta : [0,1] \mapsto \mathbb R^+ \).
-Crucially,~\citet{lipmanFlowMatchingGenerative2023} prove that by exclusively optimizing the vector field for individual data points \( z_0 \in \mathcal D \) individually, one also retrieves the optimal flow to morph the entire support of the initial distribution \( p_0 \) into \( p_1 \ \text{s.t.} \mathcal D \sim p_1 \).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.8\textwidth]{figures/ch4/ch4-normalizing-flows.png}
-    \caption{Probability distributions can be modified applying vector fields resulting in a flow of mass in the support. When acting over time, vector fields can effectively change the distribution's structure.}
-    \label{fig:ch4-normalizing-flows}
-\end{figure}
-
-While the noising schedule of DMs results in a stochastic process that resembles a random walk, FM allows for more general---potentially, deterministic---likelihood and posterior parametrization.
-In the FM literature the likelihood and posterior probabilty densities defined along a HMLV model are typically jointly referred to as a \emph{probability path}, where the distributions for successive adjacent transitions in the HMLV model are related by the (normalized) flow between them (Figure~\ref{fig:ch4-normalizing-flows}).
-The inherent flexibility of FM is one of their key advantages over DMs, as it opens up the possibility of \emph{learning} more efficient paths.
-For instance, one can design probability paths inspired by Optimal Transport (OT)---a subdiscipline studying the problem of finding the most efficient way to morph one probability distribution into another.
-Probability paths obtained through OT paths tend to be \emph{straighter} than diffusion paths (Figure~\ref{fig:ch4-diffusion-paths-versus-fm}), which can lead to faster and more stable training, as well as higher-quality sample generation with fewer steps at inference time.
-By avoiding unnecessary backtracking associated with the inherent stochastic nature of both the noising and denoising process in DMs, test-time compute is typically significantly reduced, while retaining comparable results~\citep{lipmanFlowMatchingGenerative2023}.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=\textwidth]{figures/ch4/ch4-diffusion-vs-flowmatching.png}
-    \caption{Compared to diffusion, flow matching distorts distribution along a less randomic pattern, resulting in a clearer interpolation between source and target distribution. The visualization shows an example comparison between these two methods on joint distribution of robot observations and actions over \( T=50 \) steps.}
-    \label{fig:ch4-diffusion-paths-versus-fm}
-\end{figure}
-
-
-In practice, FM can be applied to generative modeling by learning a vector field regressor \( v_\theta(z, t) \) to approximate a given target vector field \( u(t, z) \).
-In the particular case of DMs, \( u(t, z) \) is defined as in~\ref{eq:fm-diffusion-vector-field}, while in priciple the target vector field can be learned to induce a particular transportation, or fixed according to OT.
-Given a sample from the data distribution \( z_1 \sim p_1 \) and a sample from an easy-to-sample prior \( z_0 \sim p_0 \), CFM defines a simple path between them using \emph{linear interpolation} between samples \( z_t = (1-t)z_0 + t z_1 \), resulting in the target vector field \( u(t, z_t) = z_1 - z_0 \).
-Then, a FM model can be trained with the simple regression objective defined as
-\begin{align}\label{eq:flow-matching-objective}
-    \mathcal L(\theta) = \mathbb{E}_{t, z_0, z_1} \big[
-        \Vert v_\theta((1-t)z_0 + t z_1, t) - (z_1 - z_0) \Vert^2 \big], \quad t \sim \mathcal{U}([0,1]),
-\end{align}
-where \( z_0 \sim p_0(\bullet) \) and \( z_1 \sim p_1(\bullet) \). Note how in~\ref{eq:flow-matching-objective}---differently from~\ref{eq:diffusion-simplified-loss}---time is assumed to be varying continuously \( t \sim \mathcal U([0,1]) \) rather than discretely \( t \sim \mathcal U(\{0,1\}) \), a key property of flow-based models.
-The objective in~\ref{eq:flow-matching-objective} directly regresses the learned vector field onto the simple, straight path connecting a point from the prior and a point from the data, providing a simulation-free training procedure that is both stable and efficient.
-At inference time, samples are generated by starting with \( z_0 \sim p_0 \) and iteratively refined according to \( \frac{dz}{dt} = v_\theta(z_t, t) \) for \(t \in [0,1] \)---an operation that can be numerically carried out with standard ODE solvers.
-
-\subsection{Action Chunking with Transformers}
-While GMs prove useful in learning complex, high-dimensional multi-modal distributions, they do not natively address the compouding errors problem characteristic of online, sequential predictions.
-In Action Chunking with Transformers (ACT),~\citet{zhaoLearningFineGrainedBimanual2023} present an application of VAEs to the problem of learning purely from offline trajectories, introduce a simple, yet effective method to mitigate error compounding, learning high-fidelity autonomous behaviors.
-Drawing inspiration from how humans plan to enact atomically sequences of the kind \( a_{t:t+k} \) instead of single actions \( a_t \),~\citet{zhaoLearningFineGrainedBimanual2023} propose learning a GM on a dataset of input demonstrations by modeling \emph{action chunks}.
-Besides contributions to learning high-performance autonomous behaviors,~\citet{zhaoLearningFineGrainedBimanual2023} also introduce hardware contributions in the form of a low-cost bimanual robot setup (ALOHA) capable of performing fine-grained manipulation tasks, such as opening a lid, slotting a battery in its allotment or even prepare tape for application.
-
-On the robot learning side of their contributions,~\citet{zhaoLearningFineGrainedBimanual2023} adopt transformers as the architectural backbone to learn a \emph{Conditional} VAE~\citep{sohnLearningStructuredOutput2015}. 
-Conditional VAEs are a variation of the more standard VAE formulation introducing a conditioning variable on sampling from the latent prior, allowing the modeling of \emph{one-to-many} relationships between latent and data samples.
-Further, in stark contrast with previous work~\citep{florenceImplicitBehavioralCloning2022,jannerPlanningDiffusionFlexible2022},~\citet{zhaoLearningFineGrainedBimanual2023} do not learn a full joint \( p_\theta(o,a) \) on observation and actions.
-While the \emph{policy} distribution \( p_\theta(a \vert o) \) can in principle be entirely described from its joint \( p_\theta(o,a) \), it is often the case that the conditional distribution is intractable when using function approximators, as \( p_\theta(a \vert o) = \tfrac{p_\theta(o,a)}{\int_\actionspace p_\theta(o,a)} \) and the integral in the denominator is typically intractable.
-Instead of modeling the full joint using a vanilla VAE,~\citet{zhaoLearningFineGrainedBimanual2023} propose learning a \emph{conditional} VAE~\citep{sohnLearningStructuredOutput2015} modeling the policy distribution directly \( p (a \vert o) \).
-
-In practice, when learning from demonstrations adopting CVAEs results in a slight modification to the VAE objective in~\ref{eq:ELBO}, which is adapted to
-\begin{align}\label{eq:c-ELBO}
-    \text{ELBO}_{\mathcal D}(\theta, \phi, \omega) = \sum_{i=0}^{N} \left(
-            \mathbb{E}_{z \sim q_\phi(\cdot \vert o_i, a_i)} \big[ \log p_\theta(a_i \vert z, o_i) \big]
-        - \DKL \big[ q_\phi(z \vert o_i, a_i) \Vert p_\omega(z \vert o_i) \big]
-        \right)
-\end{align}
-Notice how in~\ref{eq:c-ELBO} we are now also learning a new set of parameters \( \omega \) for the prior distribution in the latent space.
-Effectively, this enables conditioning latent-space sampling (and thus reconstruction) during training, and potentially inference, providing useful when learning inherently conditional distributions like policies.
-Further, ACT is trained as a \( \beta\)-CVAE~\citep{higgins2017beta}, using a weight of the KL regularization term in~\ref{eq:c-ELBO} as an hyperparameter regulating the information condensed in the latent space, where higher \( \beta \) results in a less expressive latent space.
-
-In their work,~\citet{zhaoLearningFineGrainedBimanual2023} ablated using a GM to learn from human demonstrations compared to a simpler, supervised objective, \( \mathcal L_1(a,a^\prime) = \Vert a - a^\prime \Vert_1 \).
-Interestingly, they found the performance of these two approaches to be comparable when learning from \emph{scripted} demonstrations.
-That is, when learning from data collected rolling out a predetermined set of commands \( [q^c_0, q^c_1, \dots] \), GM did \emph{not} prove competitive compared to standard supervised learning.
-However, when learning from human demonstrations---i.e., from data collected executing commands coming from a human controller \( [q^h_0, q^h_1, \dots] \)---they found performance (success rate on a downstream task) to be severily (-33.3\%) hindered from adopting a standard supervised learning objective compared to a richer, potentially more complex to learn variational objective, in keeping with the multimodal nature of human demonstrations data and findings presented in~\citet{florenceImplicitBehavioralCloning2022}.
-The authors also ablate the action chunking paradigm, reporting significant performance gains for performing action chunking (1\% vs. 44\% success rate).
-To avoid acting openloop,~\citet{zhaoLearningFineGrainedBimanual2023} design an inference process consisting in performing inference at every timestep \( t \) and then aggregate overlapping chunks using chunks' exponential moving average.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch4/ch4-act.png}
-    \caption{Action Chunking with Transformer (ACT), as in~\citet{zhaoLearningFineGrainedBimanual2023}. ACT introduces an action chunking paradigm to cope with high-dimensional multi-modal demonstration data, and a transformer-based CVAE architecture.}
-    \label{fig:ch4-act}
-\end{figure}
-
-In ACT (Figure~\ref{fig:ch4-act}), inference for a given observation \( o \in \mathcal O \) could be performed by (1) computing a prior \( p_\omega(z \vert o) \) for the latent and (2) decoding an action chunk from a sampled latent \( z \sim p_\omega(\bullet \vert o) \), similarily to how standard VAEs generate samples, with the exception that vanilla VAEs typically pose \( p(z\vert o) \equiv p(z) \sim N(\mathbf{0}, \mathbf{I}) \) and thus skip (1).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.75\textwidth]{figures/ch4/ch4-act-encoder.png}
-    \caption{The CVAE encoder used in ACT. Input action chunks are first embedded and aggregated with positional embeddings, before being processed alongside embedded proprioperceptive information, and a learned \texttt{[CLS]} token used to aggregate input level information, and predict the style variable \( z \). The encoder is entirely disregarded at inference time.}
-    \label{fig:ch4-act-encoder}
-\end{figure}
-
-However, the authors claim using a deterministic procedure to derive \( z \) may benefit policy evaluation, and thus avoid sampling from the conditional prior at all.
-At test time, instead, they simply use \( z = \mathbf{0} \), as the conditional prior on \( z \) used in training is set to be the unit Gaussian.
-At test time, conditioning on the observation \( o \) is instead achieved through explicitly feeding proprioperceptive and visual observations to the decoder, \( p_\theta(a \vert z, o) \), while during training \( z \) is indeed sampled from the approximate posterior distribution \(p_\phi(z \vert o, a)\), which, however, disregards image observations and exclusively uses proprioperceptive states to form \( o \) for efficiency reasons (as the posterior \(q_\phi \) is completely disregarded at test time).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.75\textwidth]{figures/ch4/ch4-act-decoder.png}
-    \caption{The CVAE decoder used in ACT, comprising of a full encoder-decoder Transformer architecture. Camera observations from all \( n \) camera views are first embedded using pre-trained visual encoders, and then concatenated to the corresponding positional embeddings. Then, alongside embeddings for the proprioperceptive information available and the style variable \(z\) retrieved from the CVAE encoder, the Transformer encoder shares the matrices \( K,Q \) with the Transformer decoder, trained to decode fixed position embeddings into action valid chunks.}
-    \label{fig:ch4-act-decoder}
-\end{figure}
-
-
-\subsubsection{Code Example: Learning ACT}
-\todo{using act example}
-
-\subsection{Diffusion Policy}
-DMs proved very effective in approximating complex highly dimensional distributions, such as distributions over images~\citep{hoDenoisingDiffusionProbabilistic2020} or videos~\citep{polyakMovieGenCast2025}, thanks to their inherent capability to deal with multimodal data and training stability.
-In Diffusion Policy (DP),~\citet{chiDiffusionPolicyVisuomotor2024} present an application of DMs the field of robot learning, leveraging diffusion to model human expert demonstrations in a variety of simulated and real-world tasks.
-Similarily to Action Chunking with Transformer~\citep{zhaoLearningFineGrainedBimanual2023},~\citet{chiDiffusionPolicyVisuomotor2024} (1) adopt a modified \emph{observation-conditioned target distribution} instead of the full joint \( p(o,a) \) and (2) predict multiple actions into the future instead of a single action.
-Besides the intractability of the observations' marginal \( p_\theta(o) \) given \(p_\theta(o,a) \), DP's rationale for modeling the data distribution via \( p_\theta(a \vert o) \) stems from the rather test-time compute intensive nature of diffusion, whereby generating actions \emph{alongside} observations is likely to result in higher complexity and thus a likely larger number of denoising operations, which would prove ultimately pointless considering robotics applications rely on the capability to generate controls rather than reproducing observations.
-
-In practice, conditioning on observation data is achieved conditioning the added noise regressor \( \epsilon_\theta \) introduced in~\ref{eq:diffusion-simplified-loss} on a stack of \( T_o \) observations, resulting in the \emph{conditional} simplified diffusion objective
-\begin{align}
-    \mathcal L(\theta) &= \mathbb{E}_{t, a_{t:t+H_a}, \epsilon} \big[
-        \Vert \epsilon - \epsilon_\theta(\sqrt{\bar \alpha_t} a_{t:t+T_a} + \epsilon \sqrt{1 - \bar \alpha_t}, t, o_{t-T_o:t}) \Vert^2 \big], \label{eq:diffusion-policy-objective} \\
-        & t \sim \mathcal{U}(\{1,\dots,T\}), \quad
-        a_{t:t+T_a}, o_{t-T_o:t} \sim \mathcal{D}, \quad
-        \epsilon \sim \mathcal{N}(\mathbf{0},\mathbf{I}). \notag 
-\end{align}
-Notice how in~\ref{eq:diffusion-policy-objective} the noise regressor is conditioned both on the latent variable rank \( t \) \emph{and} on a stack of previous observations \(o_{t-T_o:t} \).
-~\citet{chiDiffusionPolicyVisuomotor2024} claim the combination of (1) conditioning on a horizon of previous observations and (2) predicting multiple actions into the future allows DP to \emph{commit to specific modes} in the data at inference time, which proves essential for good performance and avoiding undecisiveness.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch4/ch4-diffusion-policy.png}
-    \caption{The Diffusion Policy archicture, as in~\citet{chiDiffusionPolicyVisuomotor2024}. A stack of \( H_o \) previous observations is used as external conditioning to denoise a group of \( H_a \) actions. Conditioning is used at every layer of a U-Net block, and in practice allows to obtain fully-formed action chunks with as little as \(T=10\) denoising steps.}
-    \label{fig:diffusion-policy-architecture}
-\end{figure}
-
-Figure~\ref{fig:diffusion-policy-architecture} shows the convolution-based version of the architecture proposed by~\citet{chiDiffusionPolicyVisuomotor2024}, illustrating inference on a single sample from \( \mathcal D \) for simplicity.
-An arbitrarily noisy chunk of \( H_a \) actions \(\tilde a_{t:t+H_a} \) is mapped to a learned high-dimensional space. 
-Similarily, both image observations and poses are embedded before being aggregated to the action embeddings.
-Then, a U-Net~\citep{ronnebergerUNetConvolutionalNetworks2015} is trained to regress the noise added into \( \tilde a_{t:t+H_a} \), using observation conditioning information at every layer and seeking to optimize~\ref{eq:diffusion-policy-objective}.
-At inference time, the noise predictor is used to predict the quantity of noise at every \( t \in [T, \dots, 0 ] \) and iteratively subtract it from \(\tilde a_{t:t+T_a} \), reversing the diffusion process simulated in training conditioned on \(o_{t-T_o:t} \) to predict \(a_{t:t+T_a} \).
-
-Training using 50-150 demos (15-60 minutes of teleoperation data) DP achieves strong performance on a variety of simulated and real-world tasks, including dexterous and deformable manipulation tasks such as sauce pouring and mat unrolling.
-Notably, the authors ablated the relevance of using RGB camera streams as input to their policy, and observed how high frame-rate visual observations can be used to attain performance (measured as success rate) comparable to that of state-based policies, typically trained in simulation with priviledged information not directly available in real-world deployments.
-As high-frame rate RGB inputs naturally accomodate for dynamic, fast changing environments,~\citet{chiDiffusionPolicyVisuomotor2024}'s conclusion offers significant evidence for learning streamlined control policies directly from pixels.
-In their work,~\citet{chiDiffusionPolicyVisuomotor2024} also ablate the performance of DP against their baseline against the size of the dataset collected, showing that DP outperforms the considered baseline for every benchmark size considered.
-Further, to accelerate inference,~\citet{chiDiffusionPolicyVisuomotor2024} employ Denoising Diffusion Implicit Models~\citep{songDenoisingDiffusionImplicit2022}, a variant of Denoising Diffusion Probabilistic Models~\citep{hoDenoisingDiffusionProbabilistic2020} (DDPM) adopting a strictly deterministic denoising paradigm (differently from DDPM's natively stochastic one) inducing the same final distribution's as DDPM's, and yet resulting in 10 times less denoising steps at inference time~\citep{chiDiffusionPolicyVisuomotor2024}.
-Across a range of simulated and real-world tasks,~\citet{chiDiffusionPolicyVisuomotor2024} find DPs particularly performant when implementing a transformer-based network as \( \epsilon_\theta \), although the authors note the increased sensitivity of transformer networks to hyperparameters and thus explicitly recommend starting out with a simpler, convolution-based architecture for diffusion (Figure~\ref{fig:diffusion-policy-architecture}), which are however reported to be biased towards learning low-frequency components~\citep{tancikFourierFeaturesLet2020} and thus may prove more challenging to train with non-smooth action sequences.
-
-
-\subsubsection{Code Example: Learning Diffusion Policies}
-\todo{using diffusion policy example}
-
-\subsection{Optimized Inference}
-\label{sec:ch4-async-inference}
-Modern visuomotor policies output \emph{action chunks}--sequences \(\pi(o_t) = \actionchunk_t \) with \(\actionchunk_t = \bigl(a_t,a_{t+1},\dots,a_{t+H_a}\bigr) \) being a sequence of \(H_a \gg 1 \) low-level commands enqueued in an action queue, originating from an environment observation, \(o_t\).
-Predicting series of actions instead of single commands proved essential in learning complex, multi-modal behavior~\citep{zhaoLearningFineGrainedBimanual2023,chiDiffusionPolicyVisuomotor2024}.
-
-Typically, the robot executes the entire action chunk \(\actionchunk_t \), before a new observation \( o_{t+H_a} \) is passed to the policy \( \pi \) to predict the next chunk. 
-This results in open-loop inference in between observations captured every \( H_a \) timesteps.
-~\citet{zhaoLearningFineGrainedBimanual2023} adopts a different strategy whereby the robot controller interleaves chunk prediction \( \actionchunk_t \gets \pi(o_t) \) and chunk consumption \( a_t \gets \textsc{PopFront(\( \actionchunk_t \))} \), computing a new chunk of actions at every timestep \( t \) and aggregating the predicted chunks on overlapping sections.
-While adaptive---every observation at every timestep \( o_t\) is processed---such approaches rely on running inference continuously, which can be prohibitive in resource-constrained scenarios, such as edge deployments.
-
-A less resource-intensive approach is to entirely exhaust the chunk \( \actionchunk \) before predicting a new chunk of actions, a strategy we refer to as \emph{synchronous} (sync) inference. 
-Sync inference efficiently allocates computation every \( H_a \) timesteps, resulting in a reduced average computational burden at control time. 
-In contrast, it inherently hinders the responsiveness of robot systems, introducing blind lags due to the robot being \emph{idle} while computing \( \actionchunk \).
-
-We directly assess the lack of adaptiveness of robot systems due to acting open-loop, and the presence of lags at runtime by decoupling action chunk prediction \( \actionchunk \) from action execution \( a_t \gets \textsc{PopFront}(\actionchunk_t) \), developing an \emph{asynchronous} (async) inference stack (\ref{alg:async-inference}), whereby a \( \textsc{RobotClient} \) sends an observation \( o_t \) to a \( \textsc{PolicyServer} \), receiving an action chunk \( \actionchunk_t \) once inference is complete (\ref{fig:ch4-async-inference}).
-In this, we avoid execution lags by triggering chunk prediction while the control loop is still consuming a previously available queue, aggregating it with the newly incoming queue whenever available.
-In turn, async-inference tightens the loop between action prediction and action execution, by increasing the frequency at which observations are processed for chunk prediction.
-Crucially, decoupling action prediction from action execution also directly allows to allocate more computational resources on a remote policy server sending actions to the robot client over networks, something which may prove very effective in resource-constrained scenarios such as low-power robots.
-
-\begin{figure}
-    \centering
-    \begin{minipage}[t]{\textwidth}
-        \centering
-        \includegraphics[width=0.9\textwidth]{figures/ch4/ch4-async-inference.png}
-        \caption{\textbf{Asynchronous inference}. Illustration of the asynchronous inference stack. Note that the policy can be run on a remote server, possibly with GPUs.}
-        \label{fig:ch4-async-inference}
-    \end{minipage}
-    \vspace{-0.6cm}
-\end{figure}
-
-\begin{algorithm}
-  \caption{Asynchronous inference control-loop}
-  \label{alg:robotclient}
-  \begin{algorithmic}[1]
-    \State \textbf{Input:} horizon \( T \), chunk size \( H_a \), threshold \( g\in[0,1] \)
-    \State \textbf{Init:} capture \( o_0 \); send \( o_0 \) to \textsc{PolicyServer};
-           receive \( \actionchunk_0 \gets \pi(o_0) \)
-    \For{\( t \) \textbf{to} \( H_a \)}
-        \State \( a_t \gets \textsc{PopFront}(\actionchunk_t) \)
-        \State \textsc{Execute}(\( a_t \)) \Comment{execute action at step \( t \)}
-        \If{\( \tfrac{|\actionchunk_t|}{H_a} < g \)} \Comment{queue below threshold}
-            \State capture new observation, \( o_{t+1} \)
-            \If{\textsc{NeedsProcessing} \( (o_{t+1}) \) } \Comment{similarity filter, or triggers direct processing}
-                \State \texttt{async\_handle} \( \gets \textsc{AsyncInfer}(o_{t+1})\) 
-                \Comment{Trigger new chunk prediction (non blocking)}
-                \State \( \tilde{\actionchunk}_{t+1} \gets \pi(o_{t+1}) \) \Comment{New queue is predicted with the policy}
-                \State \( \actionchunk_{t+1} \gets f(\actionchunk_t,\tilde{\actionchunk}_{t+1}) \) \Comment{aggregate overlaps (if any)}
-                
-            \EndIf
-        \EndIf
-        \If {\textsc{NotCompleted}(\texttt{async\_handle})}
-            \State \( \actionchunk_{t+1} \gets \actionchunk_t \) \Comment{No update on queue (inference is not over just yet)}
-        \EndIf
-    \EndFor
-  \end{algorithmic}
-  \label{alg:async-inference}
-\end{algorithm}
-
-
-\paragraph{Implementation details}
-
-\emph{Async} inference (1) tightens the control loop by capturing observations more often, directly eliminates idle gaps at runtime, and (2) directly allows to run inference on more powerful computational resources than the ones typically available onboard autonomous robotic platforms.
-
-Algorithmically, we attain (1) on the \textsc{RobotClient}-side by consuming actions from a readily available queue until a threshold condition on the number of remaining actions in the queue (\(\vert \actionchunk_t \vert / H_a < g \)) is met. When this condition is triggered, a new observation of the environment is captured and sent to the (possibly remote) \textsc{PolicyServer}. 
-To avoid redundant server calls and erratic behavior at runtime observations are compared in joint-space, and near-duplicates are dropped.
-Two observations are considered near-duplicates if their distance in joint-space is under a predetermined threshold, \( \epsilon \in \mathbb R_+\).
-Importantly, when the queue available to robot client eventually becomes empty, the most recent observation is processed regardless of similarity.
-
-Interestingly, the behavior of async inference can be studied analytically. First, let \( \ell \) be a random variable modeling the time needed to receive an action chunk \( \actionchunk \) after sending an observation \( o \), i.e. the sum of (1) the time to send across the observation \( o \) between the \textsc{RobotClient} and \textsc{PolicyServer}, \( t_{C \to S}\) (2) the inference latency on the \textsc{PolicyServer}, \( \ell_S \) and (3) the time to send \( \actionchunk \) between the \textsc{PolicyServer} and \textsc{RobotClient}, \( t_{S \to C} \). Assuming independence, \( \mathbb E [\ell] = \mathbb E[t_{C \to S}] + \mathbb E[\ell_S] + \mathbb E[t_{S \to C}] \) which can be further simplified to \( \mathbb E[\ell] \simeq \mathbb E[\ell_S]  \), assuming communication time is (1) equal in both directions and (2) negligible with respect to the inference latency. Second, let \(\Delta t\) be the environment’s control cycle. With a real-world frame-rate of 30 frames per second, \(\Delta t=33\text{ms}\). Consequently, exhausted queues at runtime--i.e. being idle awaiting for a new chunk--are avoided for \( g \geq \frac{\mathbb E[\ell_S] / \Delta t}{H_a} \). In this, the queue threshold \( g \) plays a major role relatively to the availability of actions to the \textsc{RobotClient}.
-
-\ref{fig:ch4-queues} illustrates how the size of the action chunk \(\lvert \actionchunk_t \rvert\) evolves over time for three representative values of \(g\), detailing the following key scenarios:
-\begin{itemize}
-    \item \textbf{Sequential limit \((g=0)\).} The client drains the entire chunk before forwarding a new observation to the server. During the round-trip latency needed to compute the next chunk, the queue is empty, leaving the robot \emph{incapable of acting}.  This reproduces the behavior of a fully sequential deployment and results in an average of \( \mathbb E[\ell_S] \) idle seconds.
-    \item \textbf{Asynchronous inference \((g \in (0,1))\).} Allowing the client to consume \(1-g\) of its available queue \( \actionchunk_{t-1}\) before triggering inference for a new action queue \( \actionchunk_{t} \), amortizing computation while keeping the queue from emptying. The overlap between successive chunks provides a buffer against modeling errors without the full cost of the \(g=1\) regime. The updated queue \( \actionchunk_t\) is obtained aggregating queues on the overlapping timesteps between \( \actionchunk_{t-1}\) and the incoming \(\tilde{\actionchunk}_{t}\).
-    \item \textbf{Compute-intensive limit \((g=1)\).}  As an extreme case, and in keeping with \citet{zhaoLearningFineGrainedBimanual2023}, an observation is sent at \emph{every} timestep. The queue is therefore almost always filled, with only a minor saw-tooth due to\(\Delta t/\mathbb E[\ell_s] < 1\). While maximally reactive, this setting incurs one forward pass per control tick and can prove prohibitively expensive on limited hardware. Importantly, because the client is consuming actions while the server computes the next chunk, the available queue never gets filled again.
-\end{itemize}
-
-\begin{figure}
-    \centering
-    \begin{minipage}[t]{0.99\textwidth}
-        \centering
-        \includegraphics[width=\textwidth]{figures/ch4/ch4-queues.png}
-        \caption{Action queue size evolution at runtime for various levels of \( g\) when (A) not filtering out observation based on joint-space similarity and (B) filtering out near-duplicates observation, measuring their similarity in joint-space.}
-        \label{fig:ch4-queues}
-    \end{minipage}
-\end{figure}
-
-\ref{fig:ch4-queues} emphasizes the trade-off governed by \(g\): small values place result in idle periods, whereas \(g\approx 1\) assumes a highly accurate model and pays a significant compute price. In practice, choosing \(g\in(0,1)\) allows to strike a balance between reactivity against resource budgets. 
-If not for the aforementioned similarity filter, the \textsc{RobotClient} would send observations for processing every \( (1 - g) H_a \cdot \Delta t\) seconds, receiving a new chunk of actions every \( (1 - g) H_a \cdot \Delta t + \mathbb E[\ell_S] \), on average. 
-The presence of the observation similarity filter dilates this processing time, and serves the scope of avoiding the robot stalling due to the queue being constantly integrated with an incoming, nearly identical, action chunk. 
-In particular,~\ref{fig:ch4-queues} results in a queue which is filled with incoming actions \emph{unless} near-duplicate observations are filtered out from the processing pipeline. For clarity, the red arrow in~\ref{fig:ch4-queues} highlights a timestep where the observation similarity mechanism is bypassed, forcing a (nearly identical) observation to be processed as the queue results empty.
-
-\subsubsection{Code Example: Using Async Inference}
-\todo{async inference example}
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/sections/05_foundation_models.tex b/app/scripts/latex-to-mdx/input/sections/05_foundation_models.tex
deleted file mode 100644
index e72def1d30491f536a46c0b955b834ea2db5b963..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/05_foundation_models.tex
+++ /dev/null
@@ -1,239 +0,0 @@
-\section{Generalist Robot Policies}
-\label{sec:learning-foundation}
-
-\epigraph{\textit{Specialization is for insects}}{Robert A. Heinlein}
-
-\begin{tldr}
-Openly available large scale datasets and the development of stable, expressive and efficient architecture fostered research on the development of generalist robot policies that can operate across embodiment and tasks.
-\end{tldr}
-
-The advent of large models trained on internet-scale datasets has drastically influenced fields like Computer Vision (CV) and Natural Language Processing (NLP), shifting the paradigm towards combining (1) an initial, task-agnostic large-scale pre-training stage and a (2) task-specific, adjustment phase.
-The pre-training/adaptation paradigm has now largely replaced more classic approaches consisting of task-specific data collection, curation and model training in many subdomains within CV and NLP, motivated by the main drawback of limited scalability for \emph{task-specific approaches}, traditionally labor intensive.
-Factors including (1) the advancements in generalist models learned with self-supervision for perception~\citep{oquabDINOv2LearningRobust2024} or semantic understanding~\citep{devlinBERTPretrainingDeep2019} and (2) the popularization collective efforts to aggregate large-scale openly available datasets~\citep{collaborationOpenXEmbodimentRobotic2025,khazatskyDROIDLargeScaleInTheWild2025} are increasingly pushing the field of robot learning towards the pre-train-and-adapt paradigm.
-This shift taps into the long-standing challenge of developing generalist robot policies, and holds the premise to surpass traditionally siloed approaches to robotics problems and develop a \emph{foundation robotics model}.
-While Section~\ref{sec:learning-bc-single} introduced methods for learning \emph{single-task policies} such as ACT or Diffusion Policy, in this section we present advancements in developing \emph{generalist, multi-task, policies}, capable of performing a wide range of tasks across different environments and embodiments, and guided by unstructured instructions given via natural language.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch5/ch5-ml-vs-robotics-foundation.png}
-    \caption{Fields within ML such as Computer Vision and NLP converged on the development of foundation models, trained on a variety of large scale models and capable to perform multiple downstream tasks (top). Conversely, robotics suffered from limited standardization in terms of the architectures used, and siloed, task specific datasets, incurring in a high degree of fragmentation which traditionally hindered the development of generalist models for robotics in favour of task-specific models (bottom).}
-    \label{fig:ch5-ml-vs-robotics-foundation}
-\end{figure}
-
-\subsection{Preliminaries: Models and Data}
-The remarkable success of foundation models in NLP and CV is predicated on two core principles: architectural innovation and joint data-compute scaling.
-The transformer architecture proved instrumental in capturing long-range dependencies in sequential data such as text, and its stability and expressivity made it the \emph{de facto} standard for modern large-scale models trained on internet-scale amounts of data.
-In stark contrast with popular NLP~\citep{raffelExploringLimitsTransfer2023} and CV~\citep{ImageNet_VSS09} general-purpose datasets, the field of robotics has historically developed around task-specific datasets which hinders scalability across problems, resulting in a concrete data deficit for general-purpose robot learning.
-Unlike the wealth of relatively readily available text and images on the internet, robotics data is intrinsically embodied---datasets collected for a manipulation robot typically differ entirely from locomotion datasets.
-Further, datasets consisting of expert demonstrations are (1) intrinsically expensive to collect (2) and notoriously heterogeneous---different human experts may perform the same task optimally yet in very different ways.
-In particular, since each expert trajectory is tied to a specific robot platform and the operating conditions of its environment and task, data heterogeneity has long posed a \emph{methodological} challenge for scaling robotics datasets via aggregation. 
-Beyond this, heterogeneity also raises \emph{conceptual} issues: naively mixing data across embodiments can induce negative transfer, as control strategies developed in isolation for different robot systems in different environments may even conflict when combined.
-Thus, the high degree of fragmentation of robotics datasets and tasks has traditionally led to the development of \emph{specialist} policies, trained on small, task-specific datasets, and which excel at their designated task but fail to generalize to new situations (Figure~\ref{fig:ch5-ml-vs-robotics-foundation}).
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.8\textwidth]{figures/ch5/ch5-generalist-policies-timeline.png}
-    \caption{Early efforts in the development of generalist models for robotics include BC-Zero~\citep{jangBCZZeroShotTask2022}, RT-1~\citep{brohanRT1RoboticsTransformer2023}, and RT-2~\citep{brohanRT2VisionLanguageActionModels2023}: large scale models trained on thousands of demonstrations. The open release of the Open-X~\citep{collaborationOpenXEmbodimentRobotic2025} and DROID datasets~\citep{khazatskyDROIDLargeScaleInTheWild2025} fostered the development of open source models: OpenVLA~\citep{kimOpenVLAOpenSourceVisionLanguageAction2024}, \pizero~\citep{black$p_0$VisionLanguageActionFlow2024} and SmolVLA~\citep{shukorSmolVLAVisionLanguageActionModel2025}.}
-    \label{fig:ch5-generalist-policies-timeline}
-\end{figure}
-
-Motivated by the pursuit of generalist robot policies, the research community started investigating what and how to integrate from other domains within ML.
-Figure~\ref{fig:ch5-generalist-policies-timeline} shows a timeline of some of the most popular contributions attempting at developing generalist policies.
-Starting from BC-Zero, a latent variable model trained on 25K+ demonstrations, the field has now evolved into \( \pi_0 \), a transformer-based model trained on 10M+ demonstrations and exhibiting strong few-shot capabilities across tasks and embodiments.
-For starters, Robotics Transformer 1 (RT-1)~\citep{brohanRT1RoboticsTransformer2023} represented a significant step in the direction of developing a generalist robot policies over prior work including (1) BC-Zero~\citep{jangBCZZeroShotTask2022} and (2) Gato~\citep{reedGeneralistAgent2022}, in that~\citet{brohanRT1RoboticsTransformer2023} uses a much larger and diverse set of training tasks compared to both BC-Zero and Gato.
-In particular, RT-1 uses a transformer architecture, and is trained on as many as 130k human-recorded trajectories collected over 13 robots in the span on 17 months.
-RT-1 learns to process a history of camera images and a natural language instruction, and feeds the resulting sequence of high-dimensional tokens to a transformer, trained using a \emph{classification loss on a discretized actions space} consisting of 6 256 bins, each for each joint of a 6-dof robotic arm.
-
-Perhaps motivated by the contemporary successes of the transformer architecture in both CV and NLP, the same group of authors investigated using a discrete output space to model---inherently continuous---quantities such as actions, leveraging a (1) more powerful architecture and (2) scaling up the dataset used~\citep[RT-2]{brohanRT2VisionLanguageActionModels2023}. 
-In RT-2,~\citet{brohanRT2VisionLanguageActionModels2023} propose inheriting internet-scale semantic knowledge from large-scale multi-modal datasets to learn a single, \emph{unified model} for robotics control.
-Such a model, termed \emph{Vision-Language-Action} (VLA) in the original RT-2 paper, effectively casts robot control as a language modeling problem, and in particular as a Visual Question-Answering (VQ\&A) task, whereby the output token space used to represent \emph{string} tokens  is shared with the \emph{8-bits tokens} used to represent the 256 actuation levels of a 6-dof robot joint.
-In their work,~\citet{brohanRT2VisionLanguageActionModels2023} propose co-fine-tuning then-leading large-scale VLMs such as PaLIX~\citep{chenPaLIXScalingMultilingual2023} or PaLM-E~\citep{driessPaLMEEmbodiedMultimodal2023} on a mix of web and robotics data, thus complementing VQ\&A training with robotics-specific signal, learning to directly output robot actions in a shared token space for visual and language inputs.
-Using large models trained on internet-scale data as backbones for VLAs allows models to tap into the rich semantic knowledge embedded in the VLM's parameters, interpret new commands as well as recognize unseen objects by connecting them to concepts acquired while pre-training.
-For instance,~\citet{brohanRT2VisionLanguageActionModels2023} show that while RT-2 has never been explicitly trained to repurpose tools for a hammering task, it can still combine its semantic understanding of images, so that when asked which object between (1) a piece of paper, (2) a pair of headphones or (3) a rock may be used instead of a hammer, it answers correctly, (3).
-
-Traditionally, research involved not only training the model but also collecting the underlying data, a costly and time-consuming process—for instance, \citet{jangBCZZeroShotTask2022} gathered 25K+ trajectories before training, while RT-1 required 130K+.
-In turn, the data used in robot learning research efforts have traditionally proved rather fragmented, tailored to the specific task considered by the specific group of researchers who collected it, ultimately hindering integration.
-The Open X-Embodiment project~\citep{collaborationOpenXEmbodimentRobotic2025} was a landmark effort to address the data fragmentation problem, curating the aggregation of 60 \emph{existing} robotics datasets from 22 different robot embodiments and 21 institutions, resulting in a total 1.4M of cross-embodiments, cross-tasks, openly-available trajectories.
-Besides the contribution of an aggregate, large scale dataset,~\citet{collaborationOpenXEmbodimentRobotic2025} also demonstrated significant positive transfer \emph{across tasks and embodiments}, showing that a single model trained on multi-embodiment data can outperform specialist models trained on their respective single-embodiment datasets.
-The Distributed Robot Interaction Dataset (DROID)~\citep{khazatskyDROIDLargeScaleInTheWild2025} represents another significant step towards addressing the problem of scarse and disaggregated data in robot learning, providing a unique dataset consisting of 75K+ human demonstrations collected in realistic (\emph{in-the-wild}) manipulation settings, providing another cornerstone for building general-purpose robot policies.
-Recently, foundational datasets curated through large, centralized efforts, are increasingly complemented by decentralized, community-driven collection of robotics data.
-Software libraries as \lerobot~have been instrumental in enabling decentralized collection of large amounts of data, providing the infrastructure for researchers and practitioners to easily contribute trajectories from range of embodiments, democratizing data access via distributed collection.
-
-The success of large, proprietary models like RT-1 and RT-2, highlighted a growing accessibility gap in robotics research, as training and deploying large-scale models requires computational resources simply unattainable for most research institutions. 
-The OpenVLA project~\citep{kimOpenVLAOpenSourceVisionLanguageAction2024} emerged in direct contrast of closed-source counterparts, as a community-driven effort to create powerful, openly available VLAs.
-In particular,~\citet{kimOpenVLAOpenSourceVisionLanguageAction2024} trained OpenVLA by exclusively leveraging openly available data (970K+ from the Open-X dataset), and share training recipes alongside the model weights.
-Architecturally, OpenVLA integrates a pre-trained vision encoder to project visual tokens into the embedding space of Llama2-7B~\citep{touvronLlama2Open2023} language model backbone.
-The language model backbone is then used to predict \emph{discrete action tokens} over 256 activation levels.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch5/ch5-trends.png}
-    \caption{Robot learning is undergoing a paradigmatic shift: centralized data collections (A, left) are increasingly larger, often comprising Ms of demonstrations, and (A, right) decentralized approaches to data collection are also rising as an alternative for large scale data collection. (B) Generalist models are also becoming increasingly smaller and easier to run on limited hardware.}
-    \label{fig:ch5-trends}
-\end{figure}
-
-Figure~\ref{fig:ch5-trends} illustrates graphically the two most relevant trends in modern robot learning.
-As datasets collected via centralized, cross-institutions cooperation of increasing size are made available for the research community, decentralized datasets collected by individual researchers and practitioners have also gained traction recently, closing the gap with academic benchmarks thanks to community-contributed datasets.
-Further, models used across tasks and embodiments are also becoming much more compute-efficient, and as a result the models' size has been consistently reducing over time, with consequent gains for autonomous robots in real-world, resource-constrained environments.
-
-\subsection{Modern VLAs}
-Modern recipes to train large scale VLAs extend early efforts to learn foundation models from large amounts of data via BC, introducing significant advancements concerning both architectural and procedural aspects.
-From an architectural perspective, modern VLAs such as \pizero~\citep{black$p_0$VisionLanguageActionFlow2024} leverage a \emph{unified transformer model} for efficiency of computation, while maintaining specialized sub-components within the model for visual perception and action prediction, enabling cross-task performance via language conditioning.
-Crucially, modern VLAs including~\citet{black$p_0$VisionLanguageActionFlow2024}[\pizero] and~\citet{shukorSmolVLAVisionLanguageActionModel2025}[SmolVLA] adopt \emph{unified} transformer models employing disjoint set of weights (\emph{experts}) for compute-efficient visual-semantic understanding and robotic control.
-Procedurally, modern VLAs complement advanced Vision-Language Model (VLM) backbones with action-specific modules (1) adopting mid-sized \emph{action experts} to model continuous actions distributions \( p (a_{t:t+H_a} \vert o_t) \)---avoiding discrete action tokens entirely---and (2) relying on~\emph{action chunking}~\citep[Section~\ref{sec:learning-bc-single}]{zhaoLearningFineGrainedBimanual2023} as a strategy to reduce error compounding when predicting multiple actions learning from inherently non-i.i.d. data, such as demonstration data.
-
-These architectural and procedural innovations present three benefits. 
-First, developing architectures that exploit internet-scale pre-trained backbones allows to fully capitalizes on the vast world knowledge and skills state-of-the-art VLMs exhibit, preventig models from needing to learn visual, linguistic and semantic concepts from scratch.
-Second, using generative models for continuous action distributions allows to learn rich, multimodal data distributions, a much more likely scenario in the big-data regime typically tackled while developing generalist policies.
-Further, introducing two separate components for perception and action planning could enable using Mixture of Experts (MoE) architectures~\citep{fedusReviewSparseExpert2022}, more efficient to run and thus resulting in faster inference---a key features for models deployed in real-world scenarios.
-This new paradigm has been at the core of some of the most capable generalist policies developed to date, capable to few-shot adapt to novel tasks and to perform highly dexterous manipulation tasks, ranging from end-to-end folding laundry, to bussing tables.
-
-\subsubsection{VLMs for VLAs}
-VLMs are designed to process both visual and textual modalities---most commonly by taking both images and text as input and generating text conditioned on the visual context.
-Recent advances in VLMs have been driven by the success of LLMs, with many approaches building upon pretrained LLMs and adopting similar training paradigms to the ones used in language modeling.
-Typically, VLMs~\citep{alayracFlamingoVisualLanguage2022,laurenconWhatMattersWhen2024,linVILAPretrainingVisual2024} are constructed by integrating a pretrained vision encoder~\citep{radfordLearningTransferableVisual2021,zhaiSigmoidLossLanguage2023,finiMultimodalAutoregressivePretraining2024} with a pretrained LLM~\citep{grattafioriLlama3Herd2024,jiangMistral7B2023}.
-Training then proceeds in multiple multimodal stages, beginning with a large-scale pretraining on datasets containing image-text pairs~\citep{LAION-COCO,kakaobrain2022coyo700m} and interleaved vision-language corpora~\citep{OBELICS,MMC4}, all followed by a supervised fine-tuning stage on instruction-tuning datasets~\citep{LLaVA-1.5,tong2024cambrian,laurenconWhatMattersWhen2024}.
-The inherent multimodal nature of VLMs enables them to jointly reason over vision and language. 
-Pre-training on vast internet-scale datasets allows these models to associate visual patterns with textual descriptions, thereby acquiring a rich semantic understanding of the world---knowledge about objects, their properties, and relationships---without explicit supervision for each concept. 
-In turn, integrating a VLM as a perception backbone for a VLA allows the complete model to inherit rich world knowledge, sidestepping the need to learn visual and semantic representations from scratch. 
-In principle, this allows the robot to ground high-level natural language instructions in its visual context, and possibly recognize unseen objects by connecting them to pre-trained concepts absorbed during pre-training, improving on the possibility to generalize to novel scenarios.
-
-Recently, compute efficiency has also become a central focus in VLM research. 
-Several works aim to reduce training costs by using smaller, more diverse datasets~\citep{LLaVA-1.5,InstructBLIP,bai2025qwen25vl,zhu2024minigpt,tong2024cambrian}, training smaller-scale models~\citep{marafiotiSmolVLMRedefiningSmall2025, moondream,minicmpv2024}, or by adapting pretrained unimodal models by tuning only a small subset of parameters~\citep{shukor2023epalm,vallaeys2024improveddepalm,MAPL,FROMAGe,tsimpoukelli2021multimodalfrozen,BLIP-2}.
-While the majority of VLM research focuses on image and text modalities, recent work has demonstrated that similar techniques can be extended to integrate additional modalities, such as video and audio~\citep{wang2025internvideo2,liu2024kangaroo,zhang2025videollama,kong2024audioflam}---a particularly promising direction of research for robotics applications, where multiple sensor modalities can be integrated effectively. 
-This trend towards efficiency is paramount for robotics applications, where policies must operate under the stringent constraints of real-world deployment. 
-Indeed, robots often possess limited on-board computational resources and must react in real-time to dynamic environments. 
-Smaller and faster VLMs have thus become quintessential for developing responsive autonomous systems, enabling high-frequency control loops by reducing the latency between perception and action.
-
-\subsection{\( \pi_0 \)}
-
-\pizero~\citep{black$p_0$VisionLanguageActionFlow2024} introduce a VLA consisting of a MoE architecture consisting of (1) a pre-trained VLM backbone (Gemma 2.6B~\citep{teamGemma2Improving2024}) and (2) a dedicated action expert used to generate continuous actions via flow matching.
-Images and language are embedded with a late-fusion VLM (PaliGemma), while proprioceptive state and actions chunks are routed to a smaller action expert, initialized from scratch.
-The two separate experts communicate via self-attention layers, but maintain disjoint weights to obtain query, key and values matrices at each layer, maintaining specialization while efficiently allocating computation.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch5/ch5-pi0.png}
-    \caption{The \pizero architecture, as in~\citet{black$p_0$VisionLanguageActionFlow2024}. Vision and language tokens are routed to a VLM backbone which is prevented from attending robot proprioperceptive states and action tokens, which are instead routed to a smaller subset of weights within the architecture. The architecture is trained with Flow Matching on 10M+ trajectories from a mixture of closed and openly available datasets.}
-    \label{fig:ch5-pi0}
-\end{figure}
-
-
-Concretely, \( \pi_0 \) is a unified transformer with two disjoint sets of weights \( \phi, \theta\). 
-A larger VLM backbone \( p_\phi \) initialized from Gemma 2.6B processes multiple image frames obtained from multiple cameras points \( [\{ I_t \}_{t=1}^n] \), as well as a language instruction \([\ell_t]\) used to describe the task considered.
-Concurrently, a 300M-parameter \emph{action expert} based on a similar transformer architecture is used processes the robot proprioperceptive state \(q_t\) and an action chunk \(a_{t:t+H_a}\) (Figure~\ref{fig:ch5-pi0}).
-The different expert networks operate separately in processing the respective inputs and turning them into query, key and value matrices, and only share information between each other via self-attention layers.
-The outputs from the VLM backbone are disregarded, while the vector field regressed by the action expert is used to iteratively refine the action process.
-In particular, \pizero uses a \emph{blockwise causal attention mask} over tokens belonging to three separate blocks: (1) image and language tokens \(\mathcal T_i \)  obtained from \([\{ I_t \}_{t=1}^n, \ell_t]\), (2) proprioperceptive tokens \(\mathcal T_q \) obtained from \(q_t\), and (3) the action tokens \( \mathcal T_a \) for items in the chunk \(a^{\tau}_{t:t+H_a}\) at time \( \tau \) in the flow-matching process.
-Notably, \emph{within} each block the attention operations are bidirectional, while across blocks, future blocks are masked out.
-Formally, this corresponds to using the attention mask 
-\begin{equation*}
-    \mathbf{A} =
-    \bordermatrix{
-              & \mathcal{T}_i & \mathcal{T}_q & \mathcal{T}_a \cr
-    \mathcal{T}_i & \mathbf{1} & \mathbf{0} & \mathbf{0} \cr
-    \mathcal{T}_q & \mathbf{1} & \mathbf{1} & \mathbf{0} \cr
-    \mathcal{T}_a & \mathbf{1} & \mathbf{1} & \mathbf{1} \cr
-    },
-    \quad \mathbf{1}: \text{Bidirectional Attention}, \ \mathbf{0}: \text{Masked Attention} 
-\end{equation*}
-Note how \emph{intra}-block directional attention allows tokens to communicate freely, while \emph{inter}-block communication is mediated by the attention mask \(\mathbf{A} \).
-\emph{Blockwise causal masking} effectively prevents the pre-trained perception-language tokens from attending to robotics-tokens, likely out of distribution for VLM backbones traditionally trained on large corpora of internet, non-robotics, data.
-Crucially, because communication is obstructed between image-language tokens, proprioperceptive and action tokens, one can cache keys and values across denoising steps at runtime time, incuring in a reduced computational footprint and faster inference.
-
-In \pizero, both the VLM backbone and action expert are update using a \emph{flow matching} loss, and in particular are updated minimizing:
-\begin{align}
-    \mathcal{L}(\phi, \theta) &= 
-    \mathbb{E}_{\tau, \epsilon, o_t, a_{t:t+H_a}}\Big[
-        \big\Vert 
-            v_\theta(\underbrace{\tau a_{t:t+H_a} + (1-\tau) \epsilon}_{\tilde a_{t:t+H_a}},\, o_t,\, \tau)
-            - (\epsilon - a_{t:t+H_a})
-        \big\Vert^2
-    \Big], \label{eq:pi0-loss} \\
-    &\tau \sim \mathrm{Beta}_{[0,s]}(1.5,1), \quad
-    \epsilon \sim \mathcal{N}(\mathbf{0}, \mathbf{I}), \quad
-    o_t, a_{t:t+H_a} \sim \mathcal D \notag
-\end{align}
-Where the experts parametrized by the separate weights \( \phi, \theta \) interact with each other via self-attention layers only, so that the action expert \( v_\theta \) internal computations also depend on the VLM backbone's parameters \( \phi \).
-Importantly,~\citet{black$p_0$VisionLanguageActionFlow2024} minimize~\ref{eq:pi0-loss} over both the multimodal backbone and action expert parameters, thus updating the internal representations of the VLM using BC-specific gradients.
-In contrast,~\citet{driessKnowledgeInsulatingVisionLanguageAction2025} later show that failing to insulate the VLM knowledge from the flow matching gradients actually harms performance.
-Inference is performed iteratively refining action chunks while numerically forward-integrating the vector field predicted by the action expert,
-\begin{equation}
-    a_{t:t+H_a}^{\tau + \delta} = a_{t:t+H_a}^{\tau } + \delta v_\theta(a_{t:t+H_a}^{\tau }, o_t)
-\end{equation}
-
-Flow matching~\citep[Section\ref{sec:ch4-flow-matching}]{lipmanFlowMatchingGenerative2023} can be seen as a continuous time, detetrministic generalization of Diffusion and has proven effective in modeling highly complex multi-modal distributions, including those over images and video.
-In turn, its application to large-scale data collections of multiple human behaviors across tasks and embodiments appears rather consequential, particularly considering how it can enable faster inference via a reduced number of denoising steps---as few as 10, in \pizero.
-In particular, the action expert is model as a conditional flow matching model.
-Each action token embeds a noisy action \(a_i^{\tau} \in a^\tau_{t:t+H_a}\), alongside a sinusoidal encoding of the \emph{flow process} timestep \(\tau\). 
-The action expert then leverages full bidirectional attention across the \(H_a\) action tokens provided, as well as attends to previous proprioperceptive and image-language tokens as well.
-Interestingly, differently from a standard flow matching pipeline~\citet{lipmanFlowMatchingGenerative2023}, \(\tau\) is \emph{not} sampled from a uniform distribution \(\tau \sim \mathcal U([0,1]) \), but rather obtained from \(\tau \sim \textrm{Beta}(1.5,1) \) defined on the \( [0,s], s<1 \) support (Figure~\ref{fig:ch5-pi0-sampling-timesteps}).
-
-\begin{wrapfigure}{r}{0.4\textwidth}
-    \vspace{-10pt}
-    \centering
-    \includegraphics[width=\linewidth]{figures/ch5/ch5-pi0-sampling-timesteps.png}
-    \caption{Unlike more traditional flow-matching algorithms, \pizero uses a modified distribution for the timestep \( \tau \) used during training and inference, favouring earlier timestamps corresponding to noisier chunks.}
-    \label{fig:ch5-pi0-sampling-timesteps}
-\end{wrapfigure}
-Using such Beta distribution emphasizes higher noise levels during training, a choice~\citet{black$p_0$VisionLanguageActionFlow2024} argue allows \pizero to focus on learning the mean of the data distribution \( \mathbb E[a_{t:t+H_a} \vert o_t] \) during training, in keeping with~\citet{esserScalingRectifiedFlow2024}.
-To further optimize performance and reduce inference time,~\citet{black$p_0$VisionLanguageActionFlow2024} propose reducing the support of the timestep distribution to \([0,s], \ s < 1 \), as for any forward-integration step size \( \delta = 1-s \) timesteps above \(s \) are never sampled at inference time.
-
-Besides adopting a MoE architecture with a VLM backbone initialized from a pre-trained model and trained jointly with an action expert via flow matching, \pizero also relies on a unique pre-training corpus mixes open data of 10M+ trajectories, which~\citet{black$p_0$VisionLanguageActionFlow2024} claim to be the largest dataset used in building a foundational model in robotics to date.
-The dataset used to train \pizero---referred to as \( \pi \) dataset---comprises a private, undisclosed portion obtained via teleoperation aggregated to openly available datasets including Open-X and DROID, with \(\approx 9.1\%\) of the \( \pi \) being openly available. 
-Open datasets such as DROID and Open-X are complemeneted with expert trajectories with of dexterous demonstrations tasks spanning 7 robot configurations and 68 different tasks.
-~\citet{black$p_0$VisionLanguageActionFlow2024} show that pre-training on the \( \pi \) dataset yields a broadly capable base model, which can be adapted via post-training on narrower high-quality task data, inducing fluent multi-stage behavior while retaining robustness.
-In particular,~\citet{black$p_0$VisionLanguageActionFlow2024} report that, across a variety of benchmarks, \pizero pretrained on the \( \pi \) dataset and post-trained on extra high-quality data demonstrations \emph{consistently outperform} \pizero trained from scratch (i.e., without pretraining on the \( \pi \) dataset), further scoring the relevance of pretraining.
-~\citet{black$p_0$VisionLanguageActionFlow2024} offer an intuition behind this finding: high-quality demonstrations of a given task typically do not contain mistakes, and how human demonstrator may recover from them.
-In turn, robot trained on high-quality data exclusively with BC may be incapable to recover from failure.
-Conversely, large scale collections of human demonstrations are typically much more diverse (if anything, for their sheer scale), and therefore typically contain rich and diverse information, which may prove suboptimal for any given task when considered in isolation but that proves invaluable in coupling with a small, narrower set of demonstrations.
-
-Lastly,~\citet{black$p_0$VisionLanguageActionFlow2024} present cross-embodiment experiments where they demonstrate \pizero's ability to control both mobile and static manipulator robots with varying arm embodiments.
-The emergence of cross-embodiment capabilities is largely to be attributed to the presence of large scale cross-embodiment data in the data mixture, handled by \pizero defaulting to the maximal configuration size across the \( \pi \) dataset, and zero-padding robots with fewer dof.
-In that \pizero constantly processes 18 DoFs robots (two 6-DoF arms, two grippers, base, vertical torso), regardless of the kind of robot, and robots with fewer dofs are zero-padded.
-\pizero also relies on three camera views, and uses masked image slots for training and deployment scenarios with fewer cameras.
-
-\subsubsection{Code Example: Using \pizero}
-\todo{add code example}
-
-\subsection{SmolVLA}
-VLAs remain in an early stage of development and are not yet as mature or widely adopted as LLMs and VLMs.
-Further, much of the impactful VLA progress remains proprietary, with many models sharing only weights while withholding full training details and essential methodological components.
-SmolVLA~\citep{shukorSmolVLAVisionLanguageActionModel2025} is an entirely open-source research effort, aiming to democratize the developments of robotics foundation models by open sourcing model, training recipes and data used.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=0.9\textwidth]{figures/ch5/ch5-smolvla.png}
-    \caption{The SmolVLA architecture, as in~\citet{shukorSmolVLAVisionLanguageActionModel2025}. SmolVLA is a compact MoE model trained with flow matching to denoise action chunks. Vision and language tokens are fed to a VLM backbone, and share information with the proprioperceptive and action tokens via the attention mechanism. The attention expert interleaves SA and CA layers for further conditioning on the visual features from the VLM backbone. SmolVLA skips computations and reduces the visual tokens, resulting in 6x less memory usage than \pizero.}
-    \label{fig:ch5-smolvla}
-\end{figure}
-
-While encouraging efforts like \pizero~\citep{black$p_0$VisionLanguageActionFlow2024} demonstrate the feasibility of open VLA systems, they remain (1) large and compute-intensive and (2) dependent on closed datasets collected via centralized efforts on costly robotic platforms, ultimately hindering accessibility.
-SmolVLA mitigates both these accessibility issues by (1) prioritizing a compact, compute-efficient VLA design and (2) targeting community-contributed datasets on accessible robotic platforms such as the SO-100 and SO-101 arms.
-Similarly to \pizero, SmolVLA (Figure~\ref{fig:ch5-smolvla}) employs a MoE architecture combining a pretrained VLM backbone with a dedicated action expert, and trains with flow matching.
-To ensure efficiency and accessibility, SmolVLA adopts SmolVLM-2~\citep{marafiotiSmolVLMRedefiningSmall2025} as its VLM backbone, considering SmolVLM-2's reduced size and capability to process multiple image inputs alongside text items.
-SmolVLM-2 uses SigLIP~\citep{zhaiSigmoidLossLanguage2023} as vision encoder, producing visual features for a SmolLM2 language decoder~\citep{allalSmolLM2WhenSmol2025}.
-Further, SmolVLA adopts a smaller action expert consisting of \(\sim\)100M parameters and an interleaved stack of self and cross-attention layers.
-To improve efficiency, the action expert adopts a reduced embedding dimension compared to the VLM backbone, resulting in \( d_{v_\theta} = 0.75 d_{\text{VLM}} \).
-\citep{shukorSmolVLAVisionLanguageActionModel2025}'s design choices thus result in a much smaller size model compared to \pizero, consisting of around 450M parameters versus \pizero's 3.3B parameters.
-
-Effectively, SmolVLA consumes multi-view RGB images, a natural-language instruction, and a projected sensorimotor state token as inputs, together with the noised \emph{action chunk} \( \tilde{a_{t:t+H_a}} \) the action expert \( v_\theta \) is trained to denoise.
-In particular, robot proprioperceptive states are projected into a shared token space with the VLM to match \( d_{\text{VLM}} \), and successively projected into the expert's token space.
-Similarily to \pizero, SmolVLA adopts separate experts communicating exclusively through self-attention layers, which do not employ the same blockwise causal masking in favour of a simple causal masking, resulting in a lower triangular attention mask.
-
-In contrast with \pizero, the action expert interleaves \emph{cross-attention} (CA) and \emph{self-attention} (SA) layers, a choice shown to yield higher success and smoother action chunks in practice.
-While in the expert SA layers, tokens are used to obtain queries, keys and values, CA layers use action tokens only as queries, and instead project visual, language and proprioperceptive tokens in a shared action space to obtain keys and values.
-Notably, keys and values can be cached as well, resulting in performance gains at inference time.
-
-SmolVLA trims both token and layer compute. 
-First, it \emph{reduces visual tokens} via pixel shuffle to a fixed budget of 64 tokens per frame, foregoing tiling used during VLM pretraining for runtime efficiency. 
-Second, it \emph{skips upper VLM layers}: the action expert consumes features from the first \(N\) decoder layers, with \(N=L/2\) providing a good speed-performance trade-off and effectively halving downstream compute for the larger part of SmolVLA.
-Beyond model compactness, SmolVLA also contributes an inference stack that decouples action prediction from execution for responsiveness on modest hardware (Section~\ref{sec:ch4-async-inference}).
-
-Departing from reliance on proprietary datasets, SmolVLA pretrains exclusively on 450+ \emph{community datasets}, totaling 20K+ trajectories. 
-Because instructions in community contributed dataset can be noisy or missing, the authors re-annotate tasks with a small off-the-shelf VLM using frames sampled from the dataset, and standardize camera viewpoints by mapping sources to a consistent top/wrist/side ordering.
-At inference, similarily to \pizero, SmolVLA integrates flow over 10 steps, resulting in fast inference.
-SmolVLA proves effective across a range of both real-world and simulated environments, rivaling \pizero while being close to 40\% faster and consuming 6x less memory.
-
-\subsubsection{Code Example: Using SmolVLA}
-\todo{add code example}
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/sections/06_next_directions.tex b/app/scripts/latex-to-mdx/input/sections/06_next_directions.tex
deleted file mode 100644
index ec4e5dfb62dc81d03eec3d95cc08898bb964dde6..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/06_next_directions.tex
+++ /dev/null
@@ -1,8 +0,0 @@
-- Post training VLAs
-    - From Imitation to Refinement
-    - EXPO
-
-- World Models for robotics
-    - Cosmos
-    - World Models (1X)
-    - Sima and Genie 1
diff --git a/app/scripts/latex-to-mdx/input/sections/07_conclusions.tex b/app/scripts/latex-to-mdx/input/sections/07_conclusions.tex
deleted file mode 100644
index 5d31b994a3e213d0d05775da580be87f08482b8e..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/07_conclusions.tex
+++ /dev/null
@@ -1,19 +0,0 @@
-\section{Conclusions}
-\label{sec:conclusions}
-
-This tutorial has chronicled the paradigmatic shift transforming robotics, from the structured, model-based methods of its classical era to the dynamic, data-driven approaches that define modern robot learning. 
-We began by examining the limitations of traditional dynamics-based control, highlighting the brittleness and the significant engineering overhead required by traditional approaches, which in turn motivates more flexible, less model-intensive learning approaches.
-
-Our exploration of learning-based techniques revealed a clear trajectory of progress. 
-We began with Reinforcement Learning, acknowledging its power to learn through interaction but also its real-world challenges, particularly sample inefficiency and the complexities of reward design. 
-We saw how modern, data-driven approaches like HIL-SERL can make real-world RL feasible by incorporating human guidance and prior data. 
-The inherent difficulties of RL, however, naturally motivated a deeper dive into imitation learning. This led us to single-task policies, where Behavioral Cloning, powered by advanced generative models like Action Chunking with Transformers and Diffusion Policy, demonstrated the ability to learn complex, multimodal behaviors directly from expert demonstrations. 
-This laid the groundwork for the current frontier: the development of generalist, language-conditioned Vision-Language-Action models. 
-Architectures like \( \pi_0 \) and SmolVLA---leveraging powerful pre-trained backbones and sophisticated generative modeling techniques like flow matching---represent a significant leap towards building foundational models for robotics that can generalize across varied tasks and embodiments.
-
-A central theme throughout this work has been the critical role of openness in accelerating this progress. 
-The recent explosion in capability is inseparable from the advent of large-scale, openly available datasets, the standardization of powerful and efficient model architectures, and the development of accessible, open-source software like \lerobot. 
-We argue the convergence towards an open approach to robotics is not merely a trend but a fundamental enabler, democratizing access to cutting-edge research in a traditionally siloed field like robotics.
-
-We believe the path ahead for robot learning to be overly exciting, and filled with fundamental challenges we yet have to even scratch the surface of.
-The journey detailed in this tutorial, from the first principles to the state-of-the-art, equips researchers and practitioners alike with the context and the tools to chart their own journey in the future of robotics.
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/sections/A_foreword.tex b/app/scripts/latex-to-mdx/input/sections/A_foreword.tex
deleted file mode 100644
index 856ee1d64d3835f9e363cd5430d67c3d9cc8d8cf..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/sections/A_foreword.tex
+++ /dev/null
@@ -1,24 +0,0 @@
-\section*{Foreword}
-
-Robotics is an inherently multidisciplinary field, and is not witnessing unprecedented advancements since its inception in the 1960s.
-Yet, more than sixty years after the debut of Unimate, robots have still not fully integrated into the rich, unstructured, and dynamic world we humans inhabit.
-Over the decades, numerous disciplines have shown immense promise in tackling the challenges of creating autonomous systems.
-This tutorial takes a clear stance in the debate on whether modern Machine 
-Learning can play a pivotal role in the development of 
-autonomous robot systems: we believe this to be the case.
-
-Nonetheless, we also hold that the wealth of research from both academia and industry in classical robotics over the past six decades is, simply put, too valuable to be cast aside in favor of purely learning-based methods.
-However, the interplay between classical robotics and modern machine learning is still in its nascent stages, and the path to integration yet to be clearly defined.
-In turn our goal here is to present what we consider to be the most relevant approaches within robot learning today, while warmly extending an invite to collaborate to expand the breadth of this work! Start contributing today \href{https://github.com/fracapuano/robot-learning-tutorial}{here}.
-
-This tutorial\dots
-\begin{itemize}
-    \item Does \emph{not} aim to be a comprehensive guide to general field of robotics, manipulation or underactuated systems:~\citet{sicilianoSpringerHandbookRobotics2016} and~\citet{tedrakeRoboticManipulationPerception,tedrakeUnderactuatedRoboticsAlgorithms} do this better than we ever could.
-    \item Does \emph{not} aim to be an introduction to statistical or deep learning:~\citet{shalev-shwartzUnderstandingMachineLearning2014} and~\citet{prince2023understanding} cover these subjects better than we ever could.
-    \item Does \emph{not} aim to be a deep dive into Reinforcement Learning, Diffusion Models, or Flow Matching: invaluable works such as~\citet{suttonReinforcementLearningIntroduction2018},~\citet{nakkiranStepbyStepDiffusionElementary2024}, and~\citet{lipmanFlowMatchingGuide2024} do this better than we ever could.
-\end{itemize}
-
-Instead, our goal here is to provide an intuitive explanation as per why these disparate ideas have converged to form the exciting field of modern robot learning, driving the unprecedented progress we see today. 
-In this spirit, we follow the adage: "a jack of all trades is a master of none, \emph{but oftentimes better than a master of one}."
-
-We sincerely hope this tutorial serves as a valuable starting point for your journey into robot learning.
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/snippets/01_1_datasets.py b/app/scripts/latex-to-mdx/input/snippets/01_1_datasets.py
deleted file mode 100644
index 17fcbcd57140b3727685140976be22ba301e8b99..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/snippets/01_1_datasets.py
+++ /dev/null
@@ -1,50 +0,0 @@
-import torch
-from lerobot.datasets.lerobot_dataset import LeRobotDataset
-from lerobot.datasets.streaming_dataset import StreamingLeRobotDataset
-
-delta_timestamps = {
-    "observation.images.wrist_camera": [-0.2, -0.1, 0.0]  # 0.2, and 0.1 seconds *before* each frame
-}
-
-# Optionally, use StreamingLeRobotDataset to avoid downloading the dataset
-dataset = LeRobotDataset(
-    "lerobot/svla_so101_pickplace",
-    delta_timestamps=delta_timestamps
-)
-
-# Streams frames from the Hugging Face Hub without loading into memory
-streaming_dataset = StreamingLeRobotDataset(
-    "lerobot/svla_so101_pickplace",
-    delta_timestamps=delta_timestamps
-)
-
-# Get the 100th frame in the dataset by 
-sample = dataset[100]
-print(sample)
-# {
-# 'observation.state': tensor([...]), 
-# 'action': tensor([...]), 
-# 'observation.images.wrist_camera': tensor([3, C, H, W]), for delta timesteps
-# ...
-# }
-
-batch_size=16
-# wrap the dataset in a DataLoader to use process it batches for training purposes
-data_loader = torch.utils.data.DataLoader(
-    dataset,
-    batch_size=batch_size
-)
-
-# Iterate over the DataLoader in a training loop
-num_epochs = 1
-device = "cuda" if torch.cuda.is_available() else "cpu"
-
-for epoch in range(num_epochs):
-    for batch in data_loader:
-        # Move data to the appropriate device (e.g., GPU)
-        observations = batch["observation.state"].to(device)
-        actions = batch["action"].to(device)
-        images = batch["observation.images.wrist_camera"].to(device)
-
-        # Next, you can do amazing_model.forward(batch)
-        ...
\ No newline at end of file
diff --git a/app/scripts/latex-to-mdx/input/t1manrope.fd b/app/scripts/latex-to-mdx/input/t1manrope.fd
deleted file mode 100644
index ad997ad304d9ab0766d0dec2a53102108627fae6..0000000000000000000000000000000000000000
--- a/app/scripts/latex-to-mdx/input/t1manrope.fd
+++ /dev/null
@@ -1,18 +0,0 @@
-%% LaTeX2e file `t1manrope.fd'
-%% generated by the `filecontents' environment
-%% from source `main' on 2025/08/11.
-%%
-\ProvidesFile{t1manrope.fd}[Font definitions for T1/manrope.]
-
-\DeclareFontFamily{T1}{manrope}{}
-\DeclareFontShape{T1}{manrope}{m} {n} {<-> manroperegular } {}
-\DeclareFontShape{T1}{manrope}{b} {n} {<-> manropebold } {}
-
-\DeclareFontShape{T1}{manrope}{m} {it}{<-> ssub * manrope/m/n} {}
-\DeclareFontShape{T1}{manrope}{b} {it}{<-> ssub * manrope/b/n} {}
-
-\DeclareFontShape{T1}{manrope}{m} {sc}{<-> ssub * manrope/m/n} {}
-\DeclareFontShape{T1}{manrope}{b} {sc}{<-> ssub * manrope/b/n} {}
-
-\DeclareFontShape{T1}{manrope}{m} {sl}{<-> ssub * manrope/m/n} {}
-\DeclareFontShape{T1}{manrope}{b} {sl}{<-> ssub * manrope/b/n} {}
diff --git a/scripts/release.mjs b/scripts/release.mjs
new file mode 100644
index 0000000000000000000000000000000000000000..8ea222fe7d4066837d4c5725e8f66f2dfe7e51c5
--- /dev/null
+++ b/scripts/release.mjs
@@ -0,0 +1,95 @@
+#!/usr/bin/env node
+
+/**
+ * Release script for Research Article Template
+ * Handles semantic versioning and changelog updates
+ */
+
+import { execSync } from 'child_process';
+import { readFileSync, writeFileSync } from 'fs';
+import { join } from 'path';
+
+const PACKAGE_JSON_PATH = join(process.cwd(), 'app', 'package.json');
+const CHANGELOG_PATH = join(process.cwd(), 'CHANGELOG.md');
+
+function getCurrentVersion() {
+    const packageJson = JSON.parse(readFileSync(PACKAGE_JSON_PATH, 'utf8'));
+    return packageJson.version;
+}
+
+function updateVersion(newVersion) {
+    const packageJson = JSON.parse(readFileSync(PACKAGE_JSON_PATH, 'utf8'));
+    packageJson.version = newVersion;
+    writeFileSync(PACKAGE_JSON_PATH, JSON.stringify(packageJson, null, 2) + '\n');
+    console.log(`✅ Updated package.json to version ${newVersion}`);
+}
+
+function updateChangelog(newVersion) {
+    const changelog = readFileSync(CHANGELOG_PATH, 'utf8');
+    const today = new Date().toISOString().split('T')[0];
+
+    const updatedChangelog = changelog.replace(
+        '## [Unreleased]',
+        `## [Unreleased]\n\n## [${newVersion}] - ${today}`
+    );
+
+    writeFileSync(CHANGELOG_PATH, updatedChangelog);
+    console.log(`✅ Updated CHANGELOG.md with version ${newVersion}`);
+}
+
+function createGitTag(version) {
+    try {
+        execSync(`git tag -a v${version} -m "Release version ${version}"`, { stdio: 'inherit' });
+        console.log(`✅ Created git tag v${version}`);
+    } catch (error) {
+        console.error(`❌ Failed to create git tag: ${error.message}`);
+    }
+}
+
+function main() {
+    const args = process.argv.slice(2);
+    const versionType = args[0]; // 'major', 'minor', 'patch'
+
+    if (!['major', 'minor', 'patch'].includes(versionType)) {
+        console.error('❌ Please specify version type: major, minor, or patch');
+        console.log('Usage: node scripts/release.mjs [major|minor|patch]');
+        process.exit(1);
+    }
+
+    const currentVersion = getCurrentVersion();
+    const [major, minor, patch] = currentVersion.split('.').map(Number);
+
+    let newVersion;
+    switch (versionType) {
+        case 'major':
+            newVersion = `${major + 1}.0.0`;
+            break;
+        case 'minor':
+            newVersion = `${major}.${minor + 1}.0`;
+            break;
+        case 'patch':
+            newVersion = `${major}.${minor}.${patch + 1}`;
+            break;
+    }
+
+    console.log(`🚀 Releasing version ${newVersion} (from ${currentVersion})`);
+
+    // Update files
+    updateVersion(newVersion);
+    updateChangelog(newVersion);
+
+    // Create git tag
+    createGitTag(newVersion);
+
+    console.log(`\n🎉 Release ${newVersion} prepared!`);
+    console.log('\nNext steps:');
+    console.log('1. Review the changes:');
+    console.log('   git diff');
+    console.log('2. Commit the changes:');
+    console.log(`   git add . && git commit -m "chore: release version ${newVersion}"`);
+    console.log('3. Push the changes and tags:');
+    console.log(`   git push && git push --tags`);
+    console.log('4. Create a release on Hugging Face Spaces');
+}
+
+main();