twke commited on
Commit
f7c5dfc
1 Parent(s): 528fb15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -7
README.md CHANGED
@@ -19,9 +19,9 @@ The models released are the following:
19
 
20
  | Benchmark | Embedding dimension | Diffusion timestep |
21
  |------|------|------|
22
- | [RLBench (PerAct)]() | 120 | 100 |
23
- | [RLBench (GNFactor)]() | 120| 100 |
24
- | [CALVIN]() | 192 | 25 |
25
 
26
  ### Model Description
27
 
@@ -46,13 +46,53 @@ The models released are the following:
46
 
47
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
48
 
49
- TODO
 
50
 
51
- ### Direct Use
 
 
 
 
 
52
 
53
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
54
 
55
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
 
58
  ## Evaluation
 
19
 
20
  | Benchmark | Embedding dimension | Diffusion timestep |
21
  |------|------|------|
22
+ | [RLBench (PerAct)](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_peract.pth) | 120 | 100 |
23
+ | [RLBench (GNFactor)](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_gnfactor.pth) | 120| 100 |
24
+ | [CALVIN](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_calvin.pth) | 192 | 25 |
25
 
26
  ### Model Description
27
 
 
46
 
47
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
48
 
49
+ ### Input format
50
+ 3D Diffuser Actor takes the following inputs:
51
 
52
+ 1. `RGB observations`: a tensor of shape (batch_size, num_cameras, 3, H, W). The pixel values are in the range of [0, 1]
53
+ 2. `Point cloud observation`: a tensor of shape (batch_size, num_cameras, 3, H, W).
54
+ 3. `Instruction encodings`: a tensor of shape (batch_size, max_instruction_length, C). In this code base, the embedding dimension `C` is set to 512.
55
+ 4. `curr_gripper`: a tensor of shape (batch_size, history_length, 7), where the last channel denotes xyz-action (3D) and quarternion (4D).
56
+ 5. `trajectory_mask`: a tensor of shape (batch_size, trajectory_length), which is only used to indicate the length of each trajectory. To predict keyposes, we just need to set its shape to (batch_size, 1).
57
+ 6. `gt_trajectory`: a tensor of shape (batch_size, trajectory_length, 7), where the last channel denotes xyz-action (3D) and quarternion (4D). The input is only used during training.
58
 
59
+ ### Output format
60
+ The model returns the diffusion loss, when `run_inference=False`, otherwise, it returns pose trajectory of shape (batch_size, trajectory_length, 8) when `run_inference=True`.
61
 
62
+ ### Usage
63
+ For training, forward 3D Diffuser Actor with `run_inference=False`
64
+ ```
65
+ > loss = model.forward(gt_trajectory,
66
+ trajectory_mask,
67
+ rgb_obs,
68
+ pcd_obs,
69
+ instruction,
70
+ curr_gripper,
71
+ run_inference=False)
72
+ ```
73
+
74
+ For evaluation, forward 3D Diffuser Actor with `run_inference=True`
75
+ ```
76
+ > fake_gt_trajectory = torch.full((1, trajectory_length, 7), 0).to(device)
77
+ > trajectory_mask = torch.full((1, trajectory_length), False).to(device)
78
+ > trajectory = model.forward(fake_gt_trajectory,
79
+ trajectory_mask,
80
+ rgb_obs,
81
+ pcd_obs,
82
+ instruction,
83
+ curr_gripper,
84
+ run_inference=True)
85
+ ```
86
+
87
+ Or you can forward the model with `compute_trajectory` function
88
+ ```
89
+ > trajectory_mask = torch.full((1, trajectory_length), False).to(device)
90
+ > trajectory = model.compute_trajectory(trajectory_mask,
91
+ rgb_obs,
92
+ pcd_obs,
93
+ instruction,
94
+ curr_gripper)
95
+ ```
96
 
97
 
98
  ## Evaluation