asenppopov commited on
Commit
cea91ba
·
verified ·
1 Parent(s): ca0013a

Add model card

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - robotics
6
+ - libero
7
+ - vision-language-action
8
+ - imitation-learning
9
+ - manipulation
10
+ datasets:
11
+ - gate-institute/GATE-VLAP-datasets
12
+ ---
13
+
14
+ # GATE-VLAP: Grounded Action Trajectory Embeddings with Vision-Language Action Planning
15
+
16
+ **Trained on LIBERO-10 Benchmark**
17
+
18
+ This model is trained for robotic manipulation tasks using vision-language-action learning with semantic action chunking.
19
+
20
+ ## Model Details
21
+
22
+ - **Architecture**: CLIP-RT (CLIP-based Robot Transformer)
23
+ - **Training Dataset**: [GATE-VLAP LIBERO-10](https://huggingface.co/datasets/gate-institute/GATE-VLAP-datasets)
24
+ - **Training Epochs**: 90
25
+ - **Task Type**: Long-horizon robotic manipulation
26
+ - **Input**: RGB images (128×128) + language instructions
27
+ - **Output**: 7-DOF actions (xyz, rpy, gripper)
28
+
29
+ ## Training Details
30
+
31
+ - **Dataset**: LIBERO-10 (29 subtasks, 1,354 demonstrations)
32
+ - **Segmentation**: Semantic action chunking using Gemini Vision API
33
+ - **Framework**: PyTorch
34
+ - **Checkpoint**: Epoch 90
35
+
36
+ ## Usage
37
+
38
+ ```python
39
+ import torch
40
+ from pathlib import Path
41
+
42
+ # Load checkpoint
43
+ checkpoint = torch.load(
44
+ "checkpoints/libero_10_fixed_training_v1/epoch_90.pt",
45
+ map_location="cuda"
46
+ )
47
+
48
+ # Extract model state
49
+ model_state = checkpoint['model_state_dict']
50
+
51
+ # TODO: Add inference code here
52
+ ```
53
+
54
+ ## Performance
55
+
56
+ Training run: `libero_10_fixed_training_v1`
57
+
58
+ *Add your metrics here after evaluation*
59
+
60
+ ## Dataset
61
+
62
+ This model was trained on the [GATE-VLAP Datasets](https://huggingface.co/datasets/gate-institute/GATE-VLAP-datasets), which includes:
63
+ - LIBERO-10: 103,650 frames across 29 subtasks
64
+ - Semantic action segmentation
65
+ - Vision-language annotations
66
+
67
+ ## Citation
68
+
69
+ ```bibtex
70
+ @article{gateVLAP2024,
71
+ title={GATE-VLAP: Grounded Action Trajectory Embeddings with Vision-Language Action Planning},
72
+ author={[Your Name]},
73
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
74
+ year={2024}
75
+ }
76
+ ```
77
+
78
+ ## Maintainer
79
+
80
+ **GATE Institute** - Advanced AI Research Group, Sofia, Bulgaria
81
+
82
+ ## Links
83
+
84
+ - 🤗 **Dataset**: [gate-institute/GATE-VLAP-datasets](https://huggingface.co/datasets/gate-institute/GATE-VLAP-datasets)
85
+ - 📄 **Paper**: *Coming soon*
86
+ - 💻 **Code**: *Add your GitHub repo here*
87
+
88
+ ## License
89
+
90
+ MIT License