Spaces:
Running
title: GPUClusterEnv
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
GPU Cluster Resource Management Environment (GPUClusterEnv)
A real-world cloud infrastructure environment where agents manage GPU provisioning to handle ML training workloads under strict budget constraints.
Managing compute resources for incoming ML training jobs requires balancing strict budgets against Service Level Agreement (SLA) penalties for long queue times. This environment challenges agents to dynamically scale GPU resources to match fluctuating job arrival rates while maximizing overall reward.
π Getting Started
Installation
Clone the repository:
git clone https://github.com/yourusername/GPUClusterEnv cd GPUClusterEnvInstall dependencies:
pip install -r requirements.txtRun the FastAPI server:
uvicorn src.main:app --host 0.0.0.0 --port 7860
π§ Environment Design
Observation Space
The observation space is represented as a structured dictionary containing the current state of the GPU cluster:
| Feature | Description | Type |
|---|---|---|
time_step |
Current step in the episode. | int |
active_gpus |
Number of currently provisioned GPUs. | int |
queue_size |
Number of jobs waiting to be processed. | int |
current_budget |
Remaining budget for the episode. | float |
incoming_jobs |
Number of new jobs that arrived in the last step. | int |
Action Space
The agent controls the scaling of the infrastructure by specifying how many GPUs to provision or de-provision:
| Feature | Description | Type | Notes |
|---|---|---|---|
gpus_to_provision |
Number of GPUs to spin up (positive) or spin down (negative). | int |
Infrastructure scaling |
Reward Function
Instead of a sparse reward, the environment uses a shaped reward function that continuously evaluates the agent's performance based on processing jobs while minimizing costs and SLA penalties:
- CostPerGPU: $2.50 per step per active GPU.
- Penalty: $1.00 SLA penalty per step for each waiting job in the queue.
Terminal Conditions
An episode ends when:
- The maximum number of
time_stepsfor the task is reached. - The
current_budgetdrops to $0 or below.
π Tasks
The environment provides 3 graded tasks with escalating difficulty:
- Easy (
task_id: "easy"): Low job arrival rate, generous budget. (Max Steps: 50) - Medium (
task_id: "medium"): Moderate job arrival rate, standard budget. (Max Steps: 100) - Hard (
task_id: "hard"): High, erratic job arrival rate, tight budget. (Max Steps: 200)
π€ Baseline Agent
To evaluate the baseline agent performance:
python src/baseline.py