OpenEnv documentation

Grid World Environment

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Grid World Environment

Hugging Face Space

This directory contains the implementation of a simple 5x5 Grid World environment, designed to serve two primary purposes within the OpenEnv ecosystem:

  1. A basic Reinforcement Learning (RL) testbed: Providing a straightforward, deterministic environment for quick prototyping and testing of RL agents.
  2. A detailed β€œHow-To” guide for building new OpenEnv environments: Demonstrating the architectural patterns, best practices, and core components required to integrate a custom environment into the OpenEnv framework.

πŸš€ Environment Overview

The Grid World environment features:

  • Grid Size: A 5x5 square grid.
  • Agent: Starts at position (0,0) (top-left).
  • Goal: Fixed at (4,4) (bottom-right).
  • Actions: UP, DOWN, LEFT, RIGHT.
  • Dynamics: Deterministic. An action always moves the agent one step in the chosen direction, unless it would move off the grid, in which case the agent stays in its current cell.
  • Reward Function (Sparse):
    • -0.1 for every step taken (a β€œliving cost” or β€œstep penalty”).
    • +1.0 for reaching the goal at (4,4). This also terminates the episode.
  • Episode Termination: The episode ends when the agent reaches the goal.

Example Gameplay

Imagine the agent trying to find the goal:

  1. Reset: Agent at (0,0) β†’ Obs(x=0, y=0, reward=0.0, done=False)
  2. Step DOWN: Agent moves to (1,0) β†’ Obs(x=1, y=0, reward=-0.1, done=False)
  3. Step RIGHT: Agent moves to (1,1) β†’ Obs(x=1, y=1, reward=-0.1, done=False)
  4. …
  5. Step RIGHT (from 4,3): Agent moves to (4,4) β†’ Obs(x=4, y=4, reward=1.0, done=True)

πŸ› οΈ How to Build an OpenEnv Environment: A Detailed Guide

This section explains the structure and key design choices of the Grid World environment.

1. Scaffolding and Configuration

This environment supports multi-mode deployment. It uses pyproject.toml for modern local development (via uv) and a Dockerfile for containerized deployment.

Directory Structure

envs/grid_world_env
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ __init__.py           # Package initializer for the server side
β”‚   β”œβ”€β”€ app.py                # The FastAPI application entry point
β”‚   β”œβ”€β”€ Dockerfile            # Container definition (uses requirements.txt)
β”‚   β”œβ”€β”€ grid_world_environment.py # The core environment logic
β”‚   └── requirements.txt      # Dependencies for the Docker build
β”œβ”€β”€ __init__.py               # Package initializer for the client side
β”œβ”€β”€ client.py                 # Python client for interacting with the env server
β”œβ”€β”€ models.py                 # Pydantic data structures (Action, Observation)
β”œβ”€β”€ openenv.yaml              # OpenEnv metadata
β”œβ”€β”€ pyproject.toml            # Project configuration for local dev (uv)
β”œβ”€β”€ uv.lock                   # Exact dependency versions (Generated by uv)
β”œβ”€β”€ README.md
└── test_grid_world.sh        # Integration test script (Docker based)

Core Components Explained

This section dives into the specific code files that power the Grid World, explaining how the OpenEnv framework connects the data, logic, and server layers.


1. models.py β€” The Data Contract

This file defines the strict β€œlanguage” used for communication between the Client (RL Agent) and the Server. It relies on Pydantic to enforce type safety.

Key Components

  • MoveAction(str, Enum)
    Defines the allowed vocabulary for movement: UP, DOWN, LEFT, RIGHT.
    Using an Enum prevents magic string errors (e.g., sending "up" instead of "UP").

  • GridWorldAction(Action)
    Wraps the movement enum in a standardized OpenEnv action structure.
    When the server receives a request, FastAPI automatically validates that the incoming JSON payload matches this schema.

  • GridWorldObservation(Observation)
    Defines exactly what the agent observes from the environment:

    • x, y: Integer coordinates representing the agent’s position
    • reward: Floating-point value (e.g., -0.1, 1.0)
    • done: Boolean flag indicating episode termination

Note:
By inheriting from pydantic.BaseModel (via Observation), these classes automatically handle JSON serialization and deserialization.


2. server/grid_world_environment.py β€” The Logic

This file contains the β€œphysics engine” and rules of the environment. It translates abstract actions into concrete state transitions.

Core Responsibilities

  • Inheritance
    GridWorldEnvironment inherits from openenv.core.env_server.Environment, providing the standardized interface required by the OpenEnv server.

  • __init__ Method

    • Sets static configuration:
      • Grid size: 5 Γ— 5
      • Goal location: [4, 4]
    • Initializes the persistent state container.
  • State Persistence (self._state)

    • HTTP requests are stateless, so the environment instance must remember the agent’s position between calls.
    • self._state (an instance of openenv...State) tracks:
      • step_count
      • episode_id
      • agent_x, agent_y
  • step() Logic

    • Input: Receives a validated GridWorldAction
    • Dynamics: Applies movement rules and clamps coordinates using
      max(0, min(..., grid_size - 1)) to prevent the agent from leaving the grid
    • Feedback: Computes a sparse reward:
      • 1.0 if (x, y) == goal
      • -0.1 otherwise
    • Returns a GridWorldObservation

3. server/app.py β€” The API

This file is the β€œglue” that turns the environment logic into a running web service.

Key Elements

  • create_app Utility
    Instead of manually defining FastAPI routes, this file uses
    openenv.core.env_server.create_app.

    It:

    • Binds the environment logic (GridWorldEnvironment)
    • Connects the data models (GridWorldAction, GridWorldObservation)
    • Automatically generates standard endpoints:
      • /reset
      • /step
      • /state
      • /health
  • main() Entry Point
    Defines a main() function that calls uvicorn.run.
    This is what enables the server = "..." script in pyproject.toml to start the server.


4. server/Dockerfile β€” The Container

This file defines how the environment is packaged for production or remote deployment.

Container Setup

  • Base Image
    Builds on envtorch-base, ensuring compatible system libraries.

  • Dependencies
    Copies and installs server/requirements.txt.
    This keeps the Docker image lightweight and focused only on server-side requirements.

  • Execution

    • Exposes port 8000
    • Defines the CMD to launch uvicorn
      The container is ready to accept HTTP requests immediately upon startup.

5. pyproject.toml β€” Local Development

This file enables a modern local development workflow using uv.

Key Sections

  • Project Metadata

    • Package name: grid_world_env
    • Version information
  • Dependencies Lists libraries required for local execution:

    • fastapi
    • uvicorn
    • gymnasium
    • numpy
  • [project.scripts] Defines a shortcut command:

    server = "grid_world_env.server.app:main"

πŸš€ Getting Started

You can run the environment using uv (fastest for development) or Docker (best for deployment).


Option 1: Local Development with uv (Recommended)

Since this project is configured with pyproject.toml, you can run the server instantly.

Steps

  1. Navigate to the environment folder

    cd envs/grid_world_env
    uv run server
  2. Visit the live Swagger UI in your browser

    http://localhost:8000/docs

Option 2: Docker Integration Test

To build the full container and run the integration test suite (simulating a production deployment):


Steps

  1. Navigate to the root OpenEnv directory

  2. Run the test script

    ./envs/grid_world_env/test_grid_world.sh

Builds the Docker image

Starts the container

Runs a series of curl requests to verify functionality

Cleans up containers and images after completion

Conclusion

This Grid World environment serves as the reference implementation for building environments in OpenEnv. By following this pattern, custom environments remain:

Portable across local and containerized setups

Strictly typed through Pydantic models

Deployment-ready for development, testing, and production workflows

Update on GitHub