Spaces:
Sleeping
Sleeping
gabehubner
commited on
Commit
·
c5a78cf
1
Parent(s):
1b935a7
add md
Browse files
app.py
CHANGED
@@ -52,7 +52,19 @@ def get_frame_and_attribution(slider_value):
|
|
52 |
|
53 |
with gr.Blocks() as demo:
|
54 |
gr.Markdown("# Introspection in Deep Reinforcement Learning")
|
55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
with gr.Tab(label="Attribute"):
|
57 |
env_spec = gr.Dropdown(choices=["LunarLander-v2"],type="value",multiselect=False, label="Environment Specification (e.g.: LunarLander-v2)")
|
58 |
env = gr.Interface(title="Create the Environment", allow_flagging="never", inputs=env_spec, fn=create_training_loop, outputs=gr.JSON())
|
@@ -64,5 +76,20 @@ with gr.Blocks() as demo:
|
|
64 |
slider = gr.Slider(label="Key Frame", minimum=0, maximum=1000, step=1, value=0)
|
65 |
|
66 |
gr.Interface(fn=get_frame_and_attribution, inputs=slider, live=True, outputs=[gr.Image(label="Timestep"),gr.Label(label="Attributions")])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
demo.launch()
|
|
|
52 |
|
53 |
with gr.Blocks() as demo:
|
54 |
gr.Markdown("# Introspection in Deep Reinforcement Learning")
|
55 |
+
gr.Markdown(r"""
|
56 |
+
\#\# How this space works:
|
57 |
+
This space was created for trying to apply [Integrated Gradients](https://captum.ai/docs/extension/integrated_gradients\#:~:text=Integrated%20gradients%20is%20a%20simple,and%20feature%20or%20rule%20extraction.) \
|
58 |
+
into Deep Reinforcement Learning Scenarions. It uses PyTorch's captum library for interpretability, and Gymnasium for the emulator of the continuous lunar lander.
|
59 |
+
\#\#\# Training algorithm: [DDPG](https://arxiv.org/abs/1509.02971)
|
60 |
+
This agent was trained with Deep Deterministic Policy Gradients, and outputs an average reward of 260.8 per episode (successful)
|
61 |
+
\#\#\# Using this space:
|
62 |
+
- First, select the environment (futurely there will be more environments available)
|
63 |
+
- Then, select if you want the baseline (see IG paper for more detail) to be \
|
64 |
+
a torch `tensor` of zeroes, or a running average of the initial frames of a few episodes (selected on the right) \
|
65 |
+
- Click attribute and wait a few seconds (usually 20-25s) for the attributions to be computed with the trained agent over 10 episodes
|
66 |
+
- Finally, use the slider to get a key frame that tells the attributions of the agent. They're under a Softmax to fit the component's requirements for a probability distribution.
|
67 |
+
""")
|
68 |
with gr.Tab(label="Attribute"):
|
69 |
env_spec = gr.Dropdown(choices=["LunarLander-v2"],type="value",multiselect=False, label="Environment Specification (e.g.: LunarLander-v2)")
|
70 |
env = gr.Interface(title="Create the Environment", allow_flagging="never", inputs=env_spec, fn=create_training_loop, outputs=gr.JSON())
|
|
|
76 |
slider = gr.Slider(label="Key Frame", minimum=0, maximum=1000, step=1, value=0)
|
77 |
|
78 |
gr.Interface(fn=get_frame_and_attribution, inputs=slider, live=True, outputs=[gr.Image(label="Timestep"),gr.Label(label="Attributions")])
|
79 |
+
gr.Markdown(r"""\#\# Local Usage and Packages \
|
80 |
+
`pip install torch gymnasium 'gymnasium[box2d]'` \
|
81 |
+
You might need to install Box2D Separately, which requires a swig package to compile code from Python into C/C++, which is the language that Box2d was built in: \
|
82 |
+
`brew install swig` \
|
83 |
+
`pip install box2d \n \#\# Average Score: 164.38 (significant improvement from discrete action spaces) \
|
84 |
+
For each step, the reward: \
|
85 |
+
- is increased/decreased the closer/further the lander is to the landing pad. \
|
86 |
+
- is increased/decreased the slower/faster the lander is moving.\
|
87 |
+
- is decreased the more the lander is tilted (angle not horizontal). \
|
88 |
+
- is increased by 10 points for each leg that is in contact with the ground. \
|
89 |
+
- is decreased by 0.03 points each frame a side engine is firing.\
|
90 |
+
- is decreased by 0.3 points each frame the main engine is firing. \
|
91 |
+
The episode receives an additional reward of -100 or +100 points for crashing or landing safely respectively. An episode is considered a solution if it scores at least 200 points.\*\* \
|
92 |
+
\#\# `train()` and `load_trained()` \
|
93 |
+
`load_trained()` function loads a pre-trained model that ran through 1000 episodes of training, while `train()` does training from scratch. You can edit which one of the functions is running from the bottom of the main.py file. If you set render_mode=False, the program will train a lot faster.)\n demo.launch()""")
|
94 |
|
95 |
demo.launch()
|