OpenEnv documentation
FinRL Environment
FinRL Environment
A wrapper around FinRL stock trading environments that conforms to the OpenEnv specification.
Overview
This environment enables reinforcement learning for stock trading tasks using FinRLβs powerful StockTradingEnv, exposed through OpenEnvβs simple HTTP API. It supports:
- Stock Trading: Buy/sell actions across multiple stocks
- Portfolio Management: Track balance, holdings, and portfolio value
- Technical Indicators: MACD, RSI, CCI, DX, and more
- Flexible Configuration: Custom data sources and trading parameters
Quick Start
1. Build the Docker Image
First, build the base image (from OpenEnv root):
cd OpenEnv
docker build -t envtorch-base:latest -f src/openenv/core/containers/images/Dockerfile .Then build the FinRL environment image:
docker build -t finrl-env:latest -f envs/finrl_env/server/Dockerfile .
2. Run the Server
Option A: With Default Sample Data
docker run -p 8000:8000 finrl-env:latest
This starts the server with synthetic sample data for testing.
Option B: With Custom Configuration
Create a configuration file config.json:
{
"data_path": "/data/stock_data.csv",
"stock_dim": 3,
"hmax": 100,
"initial_amount": 100000,
"num_stock_shares": [0, 0, 0],
"buy_cost_pct": [0.001, 0.001, 0.001],
"sell_cost_pct": [0.001, 0.001, 0.001],
"reward_scaling": 0.0001,
"state_space": 25,
"action_space": 3,
"tech_indicator_list": ["macd", "rsi_30", "cci_30", "dx_30"]
}Run with configuration:
docker run -p 8000:8000 \
-v $(pwd)/config.json:/config/config.json \
-v $(pwd)/data:/data \
-e FINRL_CONFIG_PATH=/config/config.json \
finrl-env:latest3. Use the Client
from envs.finrl_env import FinRLEnv, FinRLAction
import numpy as np
# Connect to server
client = FinRLEnv(base_url="http://localhost:8000")
# Get configuration
config = client.get_config()
print(f"Trading {config['stock_dim']} stocks")
print(f"Initial capital: ${config['initial_amount']:,.0f}")
# Reset environment
result = client.reset()
print(f"Initial portfolio value: ${result.observation.portfolio_value:,.2f}")
# Trading loop
for step in range(100):
# Get current state
state = result.observation.state
# Your RL policy here (example: random actions)
num_stocks = config['stock_dim']
actions = np.random.uniform(-1, 1, size=num_stocks).tolist()
# Execute action
result = client.step(FinRLAction(actions=actions))
print(f"Step {step}: Portfolio=${result.observation.portfolio_value:,.2f}, "
f"Reward={result.reward:.2f}")
if result.done:
print("Episode finished!")
break
client.close()Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RL Training Framework β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Policy Net β β Value Net β β Replay β β
β β (PyTorch) β β (PyTorch) β β Buffer β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β ββββββββββββββββββββ΄βββββββββββββββββββ β
β β β
β ββββββββββΌβββββββββ β
β β FinRLEnv β β HTTP Client β
β β (HTTPEnvClient) β β
β ββββββββββ¬βββββββββ β
ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β HTTP (JSON)
ββββββββββΌβββββββββ
β Docker Containerβ
β Port: 8000 β
β β
β βββββββββββββββ β
β βFastAPI β β
β βServer β β
β ββββββββ¬βββββββ β
β β β
β ββββββββΌβββββββ β
β β FinRL β β
β β Environment β β
β ββββββββ¬βββββββ β
β β β
β ββββββββΌβββββββ β
β β FinRL β β
β β StockTradingβ β
β β Env β β
β βββββββββββββββ β
βββββββββββββββββββAPI Reference
FinRLAction
Trading action for the environment.
Attributes:
actions: list[float]- Array of normalized action values (-1 to 1) for each stock- Positive values: Buy
- Negative values: Sell
- Magnitude: Relative trade size
Example:
# Buy stock 0, sell stock 1, hold stock 2
action = FinRLAction(actions=[0.5, -0.3, 0.0])FinRLObservation
Observation returned by the environment.
Attributes:
state: list[float]- Flattened state vector- Structure:
[balance, prices..., holdings..., indicators...]
- Structure:
portfolio_value: float- Total portfolio value (cash + holdings)date: str- Current trading datedone: bool- Whether episode has endedreward: float- Reward for the last actionmetadata: dict- Additional information
Example:
obs = result.observation
print(f"Portfolio: ${obs.portfolio_value:,.2f}")
print(f"Date: {obs.date}")
print(f"State dimension: {len(obs.state)}")Client Methods
reset() -> StepResult[FinRLObservation]
Reset the environment to start a new episode.
result = client.reset()
step(action: FinRLAction) -> StepResult[FinRLObservation]
Execute a trading action.
action = FinRLAction(actions=[0.5, -0.3])
result = client.step(action)state() -> State
Get episode metadata (episode_id, step_count).
state = client.state()
print(f"Episode: {state.episode_id}, Step: {state.step_count}")get_config() -> dict
Get environment configuration.
config = client.get_config()
print(config['stock_dim'])
print(config['initial_amount'])Data Format
The environment expects stock data in the following CSV format:
| date | tic | close | high | low | open | volume | macd | rsi_30 | cci_30 | dx_30 |
|---|---|---|---|---|---|---|---|---|---|---|
| 2020-01-01 | AAPL | 100.0 | 102.0 | 98.0 | 99.0 | 1000000 | 0.5 | 55.0 | 10.0 | 15.0 |
| 2020-01-01 | GOOGL | 1500.0 | 1520.0 | 1480.0 | 1490.0 | 500000 | -0.3 | 48.0 | -5.0 | 20.0 |
Required columns:
date: Trading datetic: Stock ticker symbolclose,high,low,open: Price datavolume: Trading volume- Technical indicators (as specified in
tech_indicator_list)
Configuration Parameters
| Parameter | Type | Description |
|---|---|---|
data_path | str | Path to CSV file with stock data |
stock_dim | int | Number of stocks to trade |
hmax | int | Maximum shares per trade |
initial_amount | int | Starting cash balance |
num_stock_shares | list[int] | Initial holdings for each stock |
buy_cost_pct | list[float] | Transaction cost for buying (per stock) |
sell_cost_pct | list[float] | Transaction cost for selling (per stock) |
reward_scaling | float | Scaling factor for rewards |
state_space | int | Dimension of state vector |
action_space | int | Dimension of action space |
tech_indicator_list | list[str] | Technical indicators to include |
Integration with RL Frameworks
Stable Baselines 3
from stable_baselines3 import PPO
from envs.finrl_env import FinRLEnv, FinRLAction
import numpy as np
# Create custom wrapper for SB3
class SB3FinRLWrapper:
def __init__(self, base_url):
self.env = FinRLEnv(base_url=base_url)
config = self.env.get_config()
self.action_space = spaces.Box(
low=-1, high=1,
shape=(config['action_space'],),
dtype=np.float32
)
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf,
shape=(config['state_space'],),
dtype=np.float32
)
def reset(self):
result = self.env.reset()
return np.array(result.observation.state, dtype=np.float32)
def step(self, action):
result = self.env.step(FinRLAction(actions=action.tolist()))
return (
np.array(result.observation.state, dtype=np.float32),
result.reward or 0.0,
result.done,
result.observation.metadata
)
# Train
env = SB3FinRLWrapper("http://localhost:8000")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)Troubleshooting
Server wonβt start
Check if base image exists:
docker images | grep envtorch-base
Build base image if missing:
docker build -t envtorch-base:latest -f src/openenv/core/containers/images/Dockerfile .
Import errors
Make sure youβre in the src directory:
cd OpenEnv/src
python -c "from envs.finrl_env import FinRLEnv"Configuration errors
Verify your data file has all required columns:
import pandas as pd
df = pd.read_csv('your_data.csv')
print(df.columns.tolist())Examples
See the examples/ directory for complete examples:
examples/finrl_simple.py- Basic usageexamples/finrl_training.py- Full training loop with PPOexamples/finrl_backtesting.py- Backtesting a trained agent
License
BSD 3-Clause License (see LICENSE file in repository root)