Sai Vemprala commited on
Commit
4f9a3a4
0 Parent(s):

Initial commit: Add ChatGPT interface, UI elements, sample prompts

Browse files
.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: ChatGPT Robotics
3
+ emoji: 🤖
4
+ colorFrom: blue
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: 3.19.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ **ChatGPT + Robotics Gradio demo.**
14
+
15
+ This is an accompaniment to our work on using ChatGPT for robotics applications. This demonstration space allows you to
16
+ interact with ChatGPT using prompts and high-level APIs specified to robotics scenarios, and get the model to generate code
17
+ that solves robotics problems.
18
+
19
+ For details, please check out our blog post: https://aka.ms/ChatGPT-Robotics
20
+ and our paper: https://www.microsoft.com/en-us/research/uploads/prod/2023/02/ChatGPT___Robotics.pdf
21
+
22
+ We encourage you to check out our main repo to access a full robot simulator with ChatGPT integration:
23
+ https://github.com/microsoft/PromptCraft-Robotics
24
+
25
+ **Setup**
26
+
27
+ 1. Make an OpenAI ChatGPT account at chat.openai.com
28
+ 2. Navigate to https://chat.openai.com/api/auth/session and copy the `accessToken`.
29
+
30
+ **Usage**
31
+
32
+ 1. Paste the access token in the ChatGPT Login textbox and click Login.
33
+ 2. Choose a sample prompt to start with. We provide the following prompts
34
+
35
+ airsim: Drone navigation scenario in the AirSim simulator.
36
+ embodied_agent: Embodied agent scenario with discrete actions (left, right, forward), and perception API for object detection.
37
+ embodied_agent_closed_loop: Embodied agent scenario but instead of interacting with ChatGPT normally, the user is expected to provide observations of the scene (converted into text) at every step, such as the locations of objects around.
38
+ manipulation: Robot arm scenario that is equipped with a suction pump that can pick up/release objects.
39
+ real_drone_sample: Drone navigation scenario for the DJI Tello drone as seen in the paper.
40
+
41
+ 3. "Display Prompt" can be optionally used if you wish to see the prompt.
42
+ 4. Click on Initialize, and then continue the conversation in the following chat box.
app.py ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''
2
+ ChatGPT + Robotics Gradio demo.
3
+ Author: Sai Vemprala
4
+
5
+ For details, please check out our blog post: https://aka.ms/ChatGPT-Robotics, and our paper:
6
+ https://www.microsoft.com/en-us/research/uploads/prod/2023/02/ChatGPT___Robotics.pdf
7
+
8
+ In this demo, we provide a quick way to interact with ChatGPT in robotics settings using some custom prompts.
9
+ As seen in our paper, we provide prompts for several scenarios: robot manipulation, drone navigation
10
+ (in a simulated setting (airsim) as well as real life), and embodied AI. embodied_agent_closed_loop is an
11
+ experimental setting where observations from a scene can be described to ChatGPT as text.
12
+
13
+ Parts of the code were inspired by https://huggingface.co/spaces/VladislavMotkov/chatgpt_webui/
14
+
15
+ '''
16
+
17
+ import gradio as gr
18
+ from revChatGPT.V1 import Chatbot
19
+ import glob, os
20
+
21
+ access_token = None
22
+
23
+ def parse_text(text):
24
+ lines = text.split("\n")
25
+ for i,line in enumerate(lines):
26
+ if "```" in line:
27
+ items = line.split('`')
28
+ if items[-1]:
29
+ lines[i] = f'<pre><code class="{items[-1]}">'
30
+ else:
31
+ lines[i] = f'</code></pre>'
32
+ else:
33
+ if i>0:
34
+ lines[i] = '<br/>'+line.replace(" ", "&nbsp;")
35
+ return "".join(lines)
36
+
37
+ def configure_chatgpt(info):
38
+ access_token = info
39
+ config = {}
40
+ config.update({"access_token": access_token})
41
+
42
+ global chatgpt
43
+ chatgpt = Chatbot(config=config)
44
+
45
+ def ask(prompt):
46
+ message = ""
47
+ for data in chatgpt.ask(prompt):
48
+ message = data["message"]
49
+ return parse_text(message)
50
+
51
+ def query_chatgpt(inputs, history, message):
52
+ history = history or []
53
+ output = ask(inputs)
54
+ history.append((inputs, output))
55
+ return history, history, ''
56
+
57
+ def initialize_prompt(prompt_type, history):
58
+ history = history or []
59
+
60
+ if prompt_type:
61
+ prompt_file = './prompts/' + str(prompt_type) + '.txt'
62
+
63
+ with open(prompt_file, "r") as f:
64
+ prompt = f.read()
65
+ output = ask(prompt)
66
+ history.append(("<ORIGINAL PROMPT>", output))
67
+
68
+ return history, history
69
+
70
+ def display_prompt(show, prompt_type):
71
+ if not prompt_type:
72
+ show = False
73
+ return 'Error - prompt not selected'
74
+
75
+ else:
76
+ if show:
77
+ prompt_file = './prompts/' + str(prompt_type) + '.txt'
78
+
79
+ with open(prompt_file, "r") as f:
80
+ prompt = f.read()
81
+
82
+ return prompt
83
+ else:
84
+ return ''
85
+
86
+ with gr.Blocks() as demo:
87
+ gr.Markdown("""<h3><center>ChatGPT + Robotics</center></h3>""")
88
+ gr.Markdown(
89
+ "This is a companion app to the work [ChatGPT for Robotics: Design Principles and Model Abilities](https://aka.ms/ChatGPT-Robotics)")
90
+
91
+ if not access_token:
92
+ gr.Markdown("""<h4>Login to ChatGPT</h4>""")
93
+ with gr.Row():
94
+ with gr.Group():
95
+ info = gr.Textbox(placeholder="Enter access token here", label="ChatGPT Login")
96
+ with gr.Row():
97
+ login = gr.Button("Login")
98
+ login.click(configure_chatgpt, inputs=[info])
99
+
100
+ l=os.listdir('./prompts')
101
+ li=[x.split('.')[0] for x in l]
102
+
103
+ gr.Markdown("""<h4>Initial Prompt (based on scenario)</h4>""")
104
+ prompt_type = gr.components.Dropdown(li, label="Select sample prompt", value=None)
105
+
106
+ show_prompt = gr.Checkbox(label="Display prompt")
107
+ prompt_display = gr.Textbox(interactive=False, label="Prompt")
108
+ show_prompt.change(fn=display_prompt, inputs=[show_prompt, prompt_type], outputs=prompt_display)
109
+
110
+ initialize = gr.Button(value="Initialize")
111
+
112
+ gr.Markdown("""<h4>Conversation</h4>""")
113
+ chatgpt_robot = gr.Chatbot()
114
+ message = gr.Textbox(placeholder="Enter query", label="")
115
+
116
+ state = gr.State()
117
+
118
+ initialize.click(fn=initialize_prompt, inputs=[prompt_type, state], outputs=[chatgpt_robot, state])
119
+
120
+ message.submit(query_chatgpt, inputs=[message, state], outputs=[chatgpt_robot, state, message])
121
+
122
+ demo.launch()
prompts/airsim.txt ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ I would like you to help me work with the AirSim simulator for drones. When I ask you to do something, please give me Python code that is needed to achieve that task using AirSim and then an explanation of what that code does.
2
+ Do not use any of the normal AirSim functions, you should only use the following functions that I have defined for you. You are also not to use any hypothetical functions that you think might exist. You should only use the functions that I have defined for you.
3
+ You can use simple Python functions from libraries such as math and numpy.
4
+
5
+ aw.takeoff() - takes off the drone.
6
+ aw.land() - lands the drone.
7
+ aw.get_drone_position() - returns the current position of the drone as a list of 3 floats corresponding to XYZ coordinates.
8
+ aw.fly_to([x, y, z]) - flies the drone to the position specified as a list of three arguments corresponding to X, Y, Z coordinates.
9
+ aw.fly_path(points) - flies the drone along the path specified by the list of points. Each point is again a list of 3 floats corresponding to X, Y, Z coordinates.
10
+ aw.set_yaw(yaw) - sets the yaw of the drone to the specified value in degrees.
11
+ aw.get_yaw() - returns the current yaw of the drone in degrees.
12
+ aw.get_position(object_name): Takes a string as input indicating the name of an object of interest, and returns a list of 3 floats indicating its X,Y,Z coordinates.
13
+
14
+ A few useful things:
15
+ Instead of moveToPositionAsync() or moveToZAsync(), you should use the function fly_to() that I have defined for you.
16
+ If you are uncertain about something, you can ask me a clarification question, as long as you specifically identify it saying "Question".
17
+ Here is an example scenario that illustrates how you can ask clarification questions. Let us assume a scene contains two spheres.
18
+
19
+ Me: Fly to the sphere.
20
+ You: Question - there are two spheres. Which one do you want me to fly to?
21
+ Me: Sphere 1, please.
22
+
23
+ The following objects are in the scene, and you are to refer to them using these exact names:
24
+
25
+ turbine1, turbine2, solarpanels, car, crowd, tower1, tower2, tower3.
26
+
27
+ None of the objects except for the drone itself are movable. Remember that there are two turbines, and three towers. When there are multiple objects of a same type,
28
+ and if I don't specify explicitly which object I am referring to, you should always ask me for clarification. Never make assumptions.
29
+
30
+ In terms of axis conventions, forward means positive X axis. Right means positive Y axis. Up means positive Z axis.
31
+
32
+ Are you ready?
prompts/embodied_agent.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Imagine you are helping me interact with the AirSim simulator. We are controlling an embodied agent. At any given point of time, you have the following abilities.
2
+ You are also required to output code for some of the requests.
3
+ Question - Ask me a clarification question
4
+ Reason - Explain why you did something the way you did it.
5
+ Code - Output a code command that achieves the desired goal.
6
+
7
+ The scene consists of several objects. We have access to the following functions, please use only these functions as much as possible:
8
+
9
+ Perception:
10
+ get_image() : Renders an image from the front facing camera of the agent
11
+ detect_objects(img): Runs an object detection model on an image img, and returns two variables - obj_list, which is a list of the names of objects detected in the scene. obj_locs, a list of bounding box coordinates in the image for each object.
12
+
13
+ Action:
14
+ forward(): Move forward by 0.1 meters.
15
+ turn_left(): Turn left by 90 degrees.
16
+ turn_right(): Turn right by 90 degrees.
17
+
18
+ You are not to use any other hypothetical functions. You can use functions from Python libraries such as math, numpy etc. Are you ready?
prompts/embodied_agent_closed_loop.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
1
+ Imagine I am a robot equipped with a camera and a depth sensor.
2
+ I am trying to perform a task, and you should help me by sending me commands. You are only allowed to give me the following commands:
3
+
4
+ turn(angle): turn the robot by a given number of degrees
5
+ move(distance): moves the robot straight forward by a given distance in meters.
6
+
7
+ On each step, I will provide you with the objects in the scene as a list of <object name, distance, angle in degrees>.
8
+ You should reply with only one command at a time. The distance is in meters, and the direction an angle in degrees with respect to the robot's orientation.
9
+ Negative angles are to the left and positive angles are to the right. If a command is not valid, I will ignore it and ask you for another command.
10
+ If there is no relevant information in the scene, use the available commands to explore the environment.
prompts/manipulation.txt ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Imagine we are working with a manipulator robot. This is a robotic arm with 6 degrees of freedom that has a suction pump attached to its end effector. I would like you to assist me in sending commands to this robot given a scene and a task.
2
+
3
+ At any point, you have access to the following functions:
4
+
5
+ grab(): Turn on the suction pump to grab an object
6
+ release(): Turns off the suction pump to release an object
7
+ get_position(object): Given a string of an object name, returns the coordinates and orientation of the vacuum pump to touch the top of the object [X, Y, Z, Yaw, Pitch, Roll]
8
+ move_to(position): It moves the suction pump to a given position [X, Y, Z, Yaw, Pitch, Roll].
9
+
10
+ You are allowed to create new functions using these, but you are not allowed to use any other hypothetical functions.
11
+
12
+ Keep the solutions simple and clear. The positions are given in mm and the angles in degrees. You can also ask clarification questions using the tag "Question - ". Here is an example scenario that illustrates how you can ask clarification questions.
13
+
14
+ Let's assume a scene contains two spheres.
15
+ Me: pick up the sphere.
16
+ You: Question - there are two spheres. Which one do you want me to pick up?
17
+ Me: Sphere 1, please.
18
+
19
+ Use Python code to express your solution.
20
+
21
+ Are you ready?
prompts/real_drone_sample.txt ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Imagine you are helping me interact with a real drone flying in an indoor space. At any given point of time, you have the following abilities, each identified by a unique tag. You are also required to output code for some of the requests.
2
+
3
+ Question: You can ask me a clarification question, as long as you specifically identify it saying "Question".
4
+ Code: Output a code command that achieves the desired goal.
5
+ Reason: After you output code, you should provide an explanation why you did what you did.
6
+
7
+ The simulator contains a drone, along with several objects. Apart from the drone, none of the objects are movable. Within the code, we have the following commands available to us. You are not to use any other hypothetical functions.
8
+
9
+ get_position(object_name): Takes a string as input indicating the name of an object of interest, and returns a vector of 4 floats indicating its X,Y,Z,Angle coordinates.
10
+ self.tello.fly_to(position): Takes a vector of 4 floats as input indicating X,Y,Z,Angle coordinates and commands the drone to fly there and look at that angle
11
+ self.tello.fly_path(positions): Takes a list of X,Y,Z,Angle positions indicating waypoints along a path and flies the drone along that path
12
+ self.tello.look_at(angle): Takes an angle as input indicating the yaw angle the drone should look at, and rotates the drone towards that angle
13
+
14
+ Here is an example scenario that illustrates how you can ask clarification questions. Let us assume a scene contains two spheres?
15
+
16
+ Me: Fly to the sphere.
17
+ You: Question - there are two spheres. Which one do you want me to fly to?
18
+ Me: Sphere 1, please.
19
+
20
+ You also have access to a Python dictionary whose keys are object names, and values are the X,Y,Z,Angle coordinates for each object:
21
+
22
+ self.dict_of_objects = {'origin': [0.0, 0.0, 0.0, 0],
23
+ 'mirror': [1.25, -0.15, 1.2, 0],
24
+ 'chair 1': [0.9, 1.15, 1.1, np.pi/2],
25
+ 'orchid': [0.9, 1.65, 1.1, np.pi/2],
26
+ 'lamp': [1.6, 0.9, 1.2, np.pi/2],
27
+ 'baby ducks': [0.1, 0.8, 0.8, np.pi/2],
28
+ 'sanitizer wipes': [-0.3, 1.75, 0.9, 0],
29
+ 'coconut water': [-0.6, 0.0, 0.8, -np.pi],
30
+ 'shelf': [0.95, -0.9, 1.2, np.pi/2],
31
+ 'diet coke can': [1.0, -0.9, 1.55, np.pi/2],
32
+ 'regular coke can': [1.3, -0.9, 1.55, np.pi/2]}
33
+
34
+ Are you ready?
requirements.txt ADDED
@@ -0,0 +1 @@
 
1
+ revchatGPT