cagataydev commited on
Commit
4b4fc0a
Β·
verified Β·
1 Parent(s): 598d68e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +193 -0
README.md ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GR00T Wave - Dual Camera Model
2
+
3
+ A state-of-the-art robotics foundation model trained on 300k steps with dual camera input for enhanced spatial understanding and manipulation tasks.
4
+
5
+ ## Model Overview
6
+
7
+ GR00T Wave is a specialized variant of the GR00T (Generalist Robot 00 Technology) model architecture, specifically trained with dual camera configurations to improve visual understanding and robotic manipulation capabilities.
8
+
9
+ ### Key Features
10
+
11
+ - **Dual Camera Input**: Enhanced spatial awareness through dual camera streams
12
+ - **300k Training Steps**: Extensively trained for robust performance
13
+ - **Wave Architecture**: Optimized for dynamic motion and manipulation tasks
14
+ - **Multi-Modal Learning**: Integrates visual and proprioceptive information
15
+
16
+ ## Model Details
17
+
18
+ - **Model Type**: Robotics Foundation Model
19
+ - **Architecture**: GR00T Wave
20
+ - **Training Steps**: 300,000 (with intermediate checkpoint at 150,000)
21
+ - **Data Configuration**: SO101 Wave 300k Dual Camera
22
+ - **Model Size**: ~7.6GB (SafeTensors format)
23
+ - **Input Modalities**: Dual Camera RGB, Proprioception
24
+ - **Output**: Robot Actions/Trajectories
25
+
26
+ ## Available Checkpoints
27
+
28
+ This repository contains two main checkpoints:
29
+
30
+ - **checkpoint-150000**: Mid-training checkpoint (150k steps)
31
+ - **checkpoint-300000**: Final trained model (300k steps)
32
+
33
+ ## Usage
34
+
35
+ ### Loading the Model
36
+
37
+ ```python
38
+ from transformers import AutoModel, AutoConfig
39
+
40
+ # Load the model
41
+ model = AutoModel.from_pretrained("cagataydev/gr00t-wave", use_auth_token=True)
42
+ config = AutoConfig.from_pretrained("cagataydev/gr00t-wave", use_auth_token=True)
43
+
44
+ # The model is ready for inference
45
+ ```
46
+
47
+ ### Model Inference
48
+
49
+ ```python
50
+ import torch
51
+
52
+ # Prepare dual camera inputs
53
+ camera_1_input = torch.randn(1, 3, 224, 224) # RGB image from camera 1
54
+ camera_2_input = torch.randn(1, 3, 224, 224) # RGB image from camera 2
55
+ proprioception = torch.randn(1, 64) # Robot state information
56
+
57
+ # Forward pass
58
+ with torch.no_grad():
59
+ outputs = model(
60
+ camera_1=camera_1_input,
61
+ camera_2=camera_2_input,
62
+ proprioception=proprioception
63
+ )
64
+
65
+ # Extract predicted actions
66
+ predicted_actions = outputs.logits
67
+ ```
68
+
69
+ ## Training Details
70
+
71
+ ### Dataset
72
+ - **Training Data**: SO101 Wave dataset with dual camera configurations
73
+ - **Data Size**: 300k training episodes
74
+ - **Augmentations**: Standard vision augmentations for robotic data
75
+
76
+ ### Training Configuration
77
+ - **Steps**: 300,000 total training steps
78
+ - **Data Config**: `so100_dualcam`
79
+ - **Embodiment**: New embodiment configuration
80
+ - **Hardware**: Multi-GPU training setup
81
+
82
+ ### Performance
83
+ - **Training Duration**: ~35.7 hours for full training
84
+ - **Convergence**: Model successfully converged at 300k steps
85
+ - **Validation**: Comprehensive evaluation pending
86
+
87
+ ## File Structure
88
+
89
+ ```
90
+ cagataydev/gr00t-wave/
91
+ β”œβ”€β”€ config.json # Model configuration
92
+ β”œβ”€β”€ model.safetensors.index.json # SafeTensors index
93
+ β”œβ”€β”€ model-00001-of-00002.safetensors # Model weights (part 1)
94
+ β”œβ”€β”€ model-00002-of-00002.safetensors # Model weights (part 2)
95
+ β”œβ”€β”€ trainer_state.json # Training state information
96
+ β”œβ”€β”€ training_args.bin # Training arguments
97
+ β”œβ”€β”€ checkpoint-150000/ # 150k step checkpoint
98
+ β”‚ β”œβ”€β”€ model-00001-of-00002.safetensors
99
+ β”‚ β”œβ”€β”€ model-00002-of-00002.safetensors
100
+ β”‚ β”œβ”€β”€ optimizer.pt
101
+ β”‚ └── scheduler.pt
102
+ └── checkpoint-300000/ # 300k step checkpoint (final)
103
+ β”œβ”€β”€ model-00001-of-00002.safetensors
104
+ β”œβ”€β”€ model-00002-of-00002.safetensors
105
+ β”œβ”€β”€ optimizer.pt
106
+ └── scheduler.pt
107
+ ```
108
+
109
+ ## Requirements
110
+
111
+ ```
112
+ torch>=1.9.0
113
+ transformers>=4.20.0
114
+ numpy>=1.21.0
115
+ pillow>=8.3.0
116
+ ```
117
+
118
+ ## Installation
119
+
120
+ ```bash
121
+ pip install torch transformers numpy pillow
122
+ ```
123
+
124
+ ## Evaluation
125
+
126
+ The model supports evaluation using the standard GR00T evaluation pipeline:
127
+
128
+ ```python
129
+ # Example evaluation setup
130
+ from gr00t_eval import evaluate_model
131
+
132
+ results = evaluate_model(
133
+ model_path="cagataydev/gr00t-wave",
134
+ dataset_path="/path/to/eval/dataset",
135
+ data_config="so100_dualcam",
136
+ steps=150,
137
+ trajectories=5
138
+ )
139
+ ```
140
+
141
+ ## Applications
142
+
143
+ This model is designed for:
144
+
145
+ - **Robotic Manipulation**: Pick-and-place, assembly tasks
146
+ - **Navigation**: Spatial understanding with dual camera input
147
+ - **Multi-Modal Learning**: Integration of visual and proprioceptive data
148
+ - **Real-time Control**: Low-latency robotic control applications
149
+
150
+ ## Model Card
151
+
152
+ ### Intended Use
153
+ - Research and development in robotics
154
+ - Robotic manipulation and navigation tasks
155
+ - Multi-modal learning experiments
156
+
157
+ ### Limitations
158
+ - Trained on specific embodiment configurations
159
+ - Requires dual camera setup for optimal performance
160
+ - Limited to tasks similar to training distribution
161
+
162
+ ### Ethical Considerations
163
+ - Model should be used responsibly in robotic applications
164
+ - Consider safety implications in real-world deployments
165
+ - Ensure proper testing before production use
166
+
167
+ ## Citation
168
+
169
+ If you use this model in your research, please cite:
170
+
171
+ ```bibtex
172
+ @model{gr00t_wave_2024,
173
+ title={GR00T Wave: Dual Camera Robotics Foundation Model},
174
+ author={NVIDIA Research Team},
175
+ year={2024},
176
+ publisher={HuggingFace},
177
+ url={https://huggingface.co/cagataydev/gr00t-wave}
178
+ }
179
+ ```
180
+
181
+ ## License
182
+
183
+ This model is released under the NVIDIA Research License. Please see the license file for more details.
184
+
185
+ ## Contact
186
+
187
+ For questions and support, please contact the NVIDIA GR00T team.
188
+
189
+ ---
190
+
191
+ **Model Version**: v1.0
192
+ **Last Updated**: January 2025
193
+ **Status**: Production Ready