nielsr HF Staff commited on
Commit
f5e2ffe
·
verified ·
1 Parent(s): 1440da1

Enhance model card with metadata, links, and overview

Browse files

This PR significantly enhances the model card by:

* Adding `pipeline_tag: robotics`, `library_name: lerobot`, and `license: apache-2.0` to the metadata, improving discoverability and providing crucial context.
* Including relevant `tags` such as `robotics`, `vision`, `gaze`, and `foveated-vision`.
* Listing associated Hugging Face `datasets` for a complete overview of the project's ecosystem.
* Providing direct links to the paper ([https://huggingface.co/papers/2507.15833](https://huggingface.co/papers/2507.15833)), project page ([https://ian-chuang.github.io/gaze-av-aloha/](https://ian-chuang.github.io/gaze-av-aloha/)), and GitHub repository ([https://github.com/ian-chuang/gaze-av-aloha](https://github.com/ian-chuang/gaze-av-aloha)).
* Adding a comprehensive overview of the model's capabilities, key highlights, and a visual (`hero.gif`).
* Including a basic Python code snippet for loading the model and guiding users to the GitHub repository for detailed usage and training instructions.
* Adding the academic citation for proper attribution.

This update provides users with a much richer and more informative resource on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +78 -4
README.md CHANGED
@@ -1,10 +1,84 @@
1
  ---
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: [More Information Needed]
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
  - model_hub_mixin
5
  - pytorch_model_hub_mixin
6
+ - robotics
7
+ - vision
8
+ - gaze
9
+ - foveated-vision
10
+ pipeline_tag: robotics
11
+ library_name: lerobot
12
+ datasets:
13
+ - iantc104/av_aloha_sim_peg_insertion
14
+ - iantc104/av_aloha_sim_cube_transfer
15
+ - iantc104/av_aloha_sim_thread_needle
16
+ - iantc104/av_aloha_sim_pour_test_tube
17
+ - iantc104/av_aloha_sim_hook_package
18
+ - iantc104/av_aloha_sim_slot_insertion
19
  ---
20
 
21
+ # Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers
22
+
23
+ This model is part of the work presented in the paper **"Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers"**. This research explores how incorporating human-like active gaze into robotic policies can significantly enhance both efficiency and performance. It builds on recent advances in foveated image processing and applies them to an Active Vision robot system that emulates human head and eye tracking.
24
+
25
+ The framework integrates gaze information into Vision Transformers (ViTs) using a novel foveated patch tokenization scheme. This approach drastically reduces computational overhead (e.g., 94% reduction in ViT computation, 7x training speedup, 3x inference speedup reported) without sacrificing visual fidelity near regions of interest. It also improves performance for high-precision tasks and enhances robustness to unseen distractors.
26
+
27
+ <p align="center">
28
+ <img src="https://github.com/ian-chuang/gaze-av-aloha/raw/main/media/hero.gif" alt="Hero GIF" width="700">
29
+ </p>
30
+
31
+ * 📚 **Paper:** [Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers](https://huggingface.co/papers/2507.15833)
32
+ * 🌐 **Project Page:** [https://ian-chuang.github.io/gaze-av-aloha/](https://ian-chuang.github.io/gaze-av-aloha/)
33
+ * 💻 **Code:** [https://github.com/ian-chuang/gaze-av-aloha](https://github.com/ian-chuang/gaze-av-aloha)
34
+
35
+ ## About the Model
36
+
37
+ This repository contains a pretrained gaze model (e.g., `iantc104/gaze_model_av_aloha_sim_thread_needle`) which is a core component of the "Look, Focus, Act" framework. These models are designed to predict human gaze to guide foveation and action in robotic tasks. The project provides a comprehensive framework for simultaneously collecting eye-tracking data and robot demonstrations, along with a simulation benchmark and dataset for training robot policies that incorporate human gaze.
38
+
39
+ ## Associated Datasets
40
+
41
+ The model leverages the following AV-ALOHA simulation datasets with human eye-tracking annotations, which are available on Hugging Face:
42
+
43
+ * [AV ALOHA Sim Peg Insertion](https://huggingface.co/datasets/iantc104/av_aloha_sim_peg_insertion)
44
+ * [AV ALOHA Sim Cube Transfer](https://huggingface.co/datasets/iantc104/av_aloha_sim_cube_transfer)
45
+ * [AV ALOHA Sim Thread Needle](https://huggingface.co/datasets/iantc104/av_aloha_sim_thread_needle)
46
+ * [AV ALOHA Sim Pour Test Tube](https://huggingface.co/datasets/iantc104/av_aloha_sim_pour_test_tube)
47
+ * [AV ALOHA Sim Hook Package](https://huggingface.co/datasets/iantc104/av_aloha_sim_hook_package)
48
+ * [AV ALOHA Sim Slot Insertion](https://huggingface.co/datasets/iantc104/av_aloha_sim_slot_insertion)
49
+
50
+ ## Usage
51
+
52
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration. You can load it directly using `from_pretrained`.
53
+
54
+ ```python
55
+ import torch
56
+ from huggingface_hub import PyTorchModelHubMixin
57
+
58
+ # Load a specific pretrained gaze model, e.g., for the 'thread_needle' task
59
+ # Replace "iantc104/gaze_model_av_aloha_sim_thread_needle" with the actual model's repo ID
60
+ model = PyTorchModelHubMixin.from_pretrained("iantc104/gaze_model_av_aloha_sim_thread_needle")
61
+ model.eval()
62
+
63
+ # Further usage details, including how to integrate this model into a robotics pipeline,
64
+ # for training, evaluation, or data collection, can be found in the project's
65
+ # official GitHub repository: https://github.com/ian-chuang/gaze-av-aloha
66
+ ```
67
+
68
+ For detailed installation instructions, dataset preparation, training scripts, and policy evaluation within the AV-ALOHA benchmark, please refer to the comprehensive documentation on the [project's GitHub repository](https://github.com/ian-chuang/gaze-av-aloha).
69
+
70
+ ## Citation
71
+
72
+ If you find this work helpful or inspiring, please consider citing the paper:
73
+
74
+ ```bibtex
75
+ @misc{chuang2025lookfocusactefficient,
76
+ title={Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers},
77
+ author={Ian Chuang and Andrew Lee and Dechen Gao and Jinyu Zou and Iman Soltani},
78
+ year={2025},
79
+ eprint={2507.15833},
80
+ archivePrefix={arXiv},
81
+ primaryClass={cs.RO},
82
+ url={https://arxiv.org/abs/2507.15833},
83
+ }
84
+ ```