rail-berkeley commited on
Commit
9dcddab
1 Parent(s): 5dcdc3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -3
README.md CHANGED
@@ -1,3 +1,73 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: robotics
4
+ ---
5
+ # Octo Base
6
+
7
+ See https://github.com/octo-models/octo for instructions for using this model.
8
+
9
+ Octo Base is trained with a window size of 2, predicting 7-dimensional actions 4 steps into the future using a diffusion policy. The model is a Transformer with 93M parameters (equivalent to a ViT-B). Images are tokenized by preprocessing with a lightweight convolutional encoder, then grouped into 16x16 patches. Language is tokenized by applying the T5 tokenizer, and then applying the T5-Base language encoder.
10
+
11
+ Observations and tasks conform to the following spec:
12
+
13
+ Observations:
14
+
15
+ ```
16
+ {
17
+ image_primary: ('batch', 'history_window', 256, 256, 3),
18
+ image_wrist: ('batch', 'history_window', 128, 128, 3),
19
+ }
20
+ ```
21
+
22
+ Tasks:
23
+ ```
24
+ {
25
+ image_primary: ('batch', 256, 256, 3),
26
+ image_wrist: ('batch', 128, 128, 3),
27
+ language_instruction: {
28
+ attention_mask: ('batch', 16),
29
+ input_ids: ('batch', 16),
30
+ },
31
+ }
32
+ ```
33
+
34
+ At inference, you may pass in any subset of these observation and task keys, with a history window up to 2 timesteps.
35
+
36
+
37
+ This model was trained on a mix of datasets from the Open X-Embodiment dataset.
38
+
39
+ | Dataset | Proportion of batch |
40
+ |------------------------------------------------------------|---------------------|
41
+ | Fractal (Brohan et al, 2022) | 17.0\% |
42
+ | Kuka (Kalashnikov et al, 2018) | 17.0\% |
43
+ | Bridge (Walke et al, 2023) | 17.0\% |
44
+ | BC-Z (Jang et al, 2022) | 9.1\% |
45
+ | Stanford Hydra Dataset (Belkhale et al, 2023) | 6.0\% |
46
+ | Language Table~ (Lynch et al, 2023) | 5.9\% |
47
+ | Taco Play (Rosete-Beas et al, 2022, Mees et al., 2023) | 3.6\% |
48
+ | Furniture Bench Dataset (Heo et al, 2023) | 3.3\% |
49
+ | UTAustin Mutex (Shah et al, 2023) | 3.0\% |
50
+ | Austin Sailor Dataset (Nasiriany et al, 2022) | 2.9\% |
51
+ | Roboturk (Mandlekar et al, 2018) | 2.8\% |
52
+ | Toto (Zhou et al, 2023) | 2.4\% |
53
+ | Austin Sirius Dataset (Liu et al, 2023) | 2.3\% |
54
+ | Berkeley Autolab UR5 (Chen et al) | 1.5\% |
55
+ | IAMLab CMU Pickup Insert (Saxena et al, 2023) | 1.2\% |
56
+ | Viola (Zhu et al, 2023) | 1.2\% |
57
+ | Berkeley Fanuc Manipulation (Zhu et al, 2023) | 1.0\% |
58
+ | NYU Franka Play Dataset (Cui et al, 2022) | 0.9\% |
59
+ | UCSD Kitchen Dataset (Ge Yan and Wang, 2023) | <0.1\% |
60
+ | Jaco Play (Dass et al, 2023) | 0.6\% |
61
+ | Berkeley Cable Routing (Luo et al, 2023) | 0.3\% |
62
+ | Austin Buds Dataset (Zhu et al, 2022) | 0.3\% |
63
+ | CMU Stretch (Mendonca et al, 2023) | 0.2\% |
64
+ | NYU Door Opening (Pari et al, 2021) | 0.1\% |
65
+ | DLR EDAN Shared Control (Quere et al, 2020) | 0.1\% |
66
+
67
+ # Updates for Version 1.5
68
+ - Language task tokens are now repeated at every timestep in the context window.
69
+ - Augmented the language instructions in the data with rephrasings from GPT-3.5.
70
+ - Bug fixes:
71
+ - Turned off dropout in the diffusion head due to incompatibility with layer norm.
72
+ - Fixed an off-by-one error with the attention mask.
73
+ - Fixed an issue where different image augmentations did not get fresh random seeds.