Spaces:

kgdrathan
/

explainer-env

Running

App Files Files Community

kgdrathan commited on 11 days ago

Commit

1eaaf1d

verified ·

1 Parent(s): 5869d56

Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +11 -0
__marimo__/session/a.py.json +0 -0
a.html +0 -0
a.py +166 -0

README.md CHANGED Viewed

@@ -13,6 +13,17 @@ tags:
   - RL
 ---
 <p align="center">
   <span style="font-size:2.2em; font-weight:bold;">See. Interact. Understand.</span>

   - RL
 ---
+<p align="center">
+  <a href="https://kgdrathan-explainer-env-dashboard.hf.space/">
+    <strong>Open the live dashboard</strong>
+  </a>
+</p>
+<p align="center">
+  <a href="https://kgdrathan-explainer-env-dashboard.hf.space/">
+    https://kgdrathan-explainer-env-dashboard.hf.space/
+  </a>
+</p>
 <p align="center">
   <span style="font-size:2.2em; font-weight:bold;">See. Interact. Understand.</span>

__marimo__/session/a.py.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

a.html ADDED Viewed

The diff for this file is too large to render. See raw diff

a.py CHANGED Viewed

	@@ -0,0 +1,166 @@

+import marimo as mo
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from matplotlib.patches import Rectangle
+# Shared variables
+app = mo.App()
+@app.cell
+def _(mo):
+    mo.md("""
+    # Reinforcement Learning Basics
+    Reinforcement Learning (RL) is a type of machine learning where an **agent** learns to make decisions by interacting with an **environment**. The goal is to learn a **policy** that maximizes the cumulative reward over time.
+    ## Core Concepts
+    1. **Agent**: The learner/decision-maker
+    2. **Environment**: Everything outside the agent
+    3. **State (s)**: The current situation
+    4. **Action (a)**: What the agent can do
+    5. **Reward (r)**: Feedback from environment
+    6. **Policy (π)**: Strategy that agents use to decide actions
+    7. **Value Function (V)**: How good it is to be in a state
+    8. **Q-Function (Q)**: How good it is to take an action in a state
+    9. **Bellman Equation**: Relationship between value functions at different time steps
+    """)
+    return
+@app.cell
+def _(mo):
+    # Simple grid world example
+    grid_size = 4
+    start = (0, 0)
+    goal = (3, 3)
+    obstacles = [(1, 1), (1, 2)]
+    # Create a simple visualization
+    _fig, _ax = plt.subplots(figsize=(6, 6))
+    _ax.set_xlim(0, grid_size)
+    _ax.set_ylim(0, grid_size)
+    _ax.set_xticks(range(grid_size))
+    _ax.set_yticks(range(grid_size))
+    _ax.grid(True)
+    # Draw obstacles
+    for obs in obstacles:
+        rect = Rectangle((obs[0], obs[1]), 1, 1, facecolor="black", alpha=0.7)
+        _ax.add_patch(rect)
+    # Draw start and goal
+    start_rect = Rectangle(start, 1, 1, facecolor="green", alpha=0.7)
+    goal_rect = Rectangle(goal, 1, 1, facecolor="red", alpha=0.7)
+    _ax.add_patch(start_rect)
+    _ax.add_patch(goal_rect)
+    _ax.text(start[0] + 0.5, start[1] + 0.5, "Start", ha="center", va="center")
+    _ax.text(goal[0] + 0.5, goal[1] + 0.5, "Goal", ha="center", va="center")
+    _ax.set_title("Simple Grid World Example")
+    _ax.invert_yaxis()  # To match standard grid coordinates
+    mo.ui.matplotlib(_fig)
+    plt.close(_fig)
+    return
+@app.cell
+def _(mo):
+    mo.md("""
+    ## How It Works
+    The agent interacts with the environment in episodes:
+    1. **Observe State (s)**: Agent senses its current situation
+    2. **Choose Action (a)**: Based on policy π(a|s)
+    3. **Environment Transitions**: Move to new state s'
+    4. **Receive Reward (r)**: Immediate feedback
+    5. **Update Knowledge**: Learn from experience
+    The goal is to maximize expected cumulative discounted reward:
+    $G_t = \sum_{k=0}^{\infty} \gamma^k r_{t+k+1}$
+    Where γ ∈ [0,1] is the discount factor.
+    """)
+    return
+@app.cell
+def _(mo):
+    # Value function explanation
+    mo.md("""
+    ### Value Functions
+    The **Value Function V(s)** represents how good it is to be in a state:
+    $V(s) = \mathbb{E}[G_t | S_t = s]$
+    The **Q-Function Q(s,a)** represents how good it is to take an action in a state:
+    $Q(s,a) = \mathbb{E}[G_t | S_t = s, A_t = a]$
+    These functions help the agent evaluate the long-term reward of states and actions.
+    """)
+    return
+@app.cell
+def _(mo):
+    # Bellman Equation explanation
+    mo.md("""
+    ### Bellman Equation
+    The Bellman equation expresses the relationship between the value of a state and the values of subsequent states:
+    $V(s) = \max_a \sum_{s'} P(s'|s,a)[r(s,a,s') + \gamma V(s')]$
+    And for Q-values:
+    $Q(s,a) = \sum_{s'} P(s'|s,a)[r(s,a,s') + \gamma \max_{a'} Q(s',a')]$
+    These equations are fundamental to solving RL problems iteratively.
+    """)
+    return
+@app.cell
+def _(mo):
+    # Policy definition
+    mo.md("""
+    ### Policy π(a|s)
+    A policy defines the behavior of an agent. It's a mapping from states to probabilities of selecting each possible action.
+    For example, a stochastic policy could be:
+    $\pi(a|s) = \text{Probability of taking action } a \text{ in state } s$
+    The goal is to find an optimal policy π* that maximizes expected cumulative reward.
+    """)
+    return
+@app.cell
+def _(mo):
+    # Interactive elements
+    mo.md("""
+    ## Try It Yourself!
+    Below is an interactive grid world. You can visualize how an agent might navigate from start to goal while avoiding obstacles.
+    ### Next Steps
+    - Understand how rewards influence agent behavior
+    - Explore how policies change based on learning
+    - Study how value functions converge over time
+    """)
+    return
+if __name__ == "__main__":
+    app.run()