File size: 3,079 Bytes
566dbba
 
7a64d10
566dbba
 
 
05e2cb7
566dbba
05e2cb7
566dbba
50f271e
566dbba
 
47097db
117bf34
7a64d10
117bf34
47097db
117bf34
fc126c5
 
7a64d10
47097db
0f7319f
7a64d10
 
 
 
47097db
4a87cff
47097db
fc126c5
47097db
 
fc126c5
47097db
 
 
0f7319f
fc126c5
47097db
 
 
 
 
 
 
fc126c5
e7fd469
47097db
4a87cff
fc126c5
a2fa930
7a64d10
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: Code As Policies
emoji: 🗣🦾
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 3.23.0
app_file: app.py
pinned: true
license: apache-2.0
duplicated_from: jackyliang42/code-as-policies
---

# Code as Policies Tabletop Manipulation Interactive Demo

This demo is based on the [original demo](https://huggingface.co/spaces/jackyliang42/code-as-policies) from the paper:

[Code as Policies: Language Model Programs for Embodied Control](https://code-as-policies.github.io/)

## Preparations
1. Obtain an [OpenAI API Key](https://openai.com/blog/openai-api/)
2. Enter your API key in the form below. We use the supplied key to use OpenAI APIs (which can incur a cost) solely for your demo interactions. It is not retained after you session.

## Usage
1. You can change the LM model to use, and how many blocks and bowls to be spawned in the environment.
2. Click `Setup/Reset Simulation`. Too many objects can cause the setup to hang.
3. Based on the new randomly sampled object names, input an instruction or ask a question and click `Run`.
4. You can conveniently affix your instructions with buttons.

You can run instructions in sequence and refer back to previous instructions (e.g. do the same with other blocks, move the same block to the other bowl, etc). To reset, click Setup/Reset Env, and this will clear the current instruction history.

## Supported Instructions
* Spatial reasoning (e.g. to the left of the red block, the closest corner, the farthest bowl, the second block from the right)
* Sequential actions (e.g. put blocks in matching bowls, stack blocks on the bottom right corner)
* Contextual instructions (e.g. do the same with the blue block, undo that)
* Language-based reasoning (e.g. put the forest-colored block on the ocean-colored bowl).
* Simple Q&A (e.g. how many blocks are to the left of the blue bowl?)

## Example Instructions
Note object names may need to be changed depending the sampled object names.
* put the sun-colored block on the bowl closest to it
* stack the blocks on the bottom most bowl
* arrange the blocks as a square in the middle
* move the square 5cm to the right
* how many blocks are to the right of the orange bowl?
* pick up the block closest to the top left corner and place it on the bottom right corner

## Known Limitations
* In simulation we're using ground truth object poses instead of using vision models. This means that instructions the require knowledge of visual apperances (e.g. darkest bowl, largest object, empty bowls) are not supported.
* Currently, the low-level pick place primitive does not do collision checking, so if there are many objects on the table, placing actions may incur collisions.
* The pick place primitive is also unable to pick up bowls.
* Prompt saturation - if too many instructions (10+) are executed in a row, then the LLM may start to ignore examples in the early parts of the prompt.
* Maximum token length - you may hit the maximum token length if running multiple commands in sequence. Please reset the simulation when this happens.

### Author
Falcon Dai