Link of the environment: https://huggingface.co/spaces/hard007ik/ShopManagerEng Link of Blog.md: https://huggingface.co/spaces/hard007ik/ShopManagerEng/tree/main/Blog.md
Jewelry Shop Manager β RL Environment
A reinforcement learning environment simulating a jewelry shop management pipeline. An AI agent navigates three sequential phases β buying raw materials, selecting products to craft based on demand, and negotiating sales β to maximize profit.
Environment Overview
Phase 1: Market (Buy / Wait)
- Gold prices fluctuate Β±10% each round (up to 3 rounds).
- The agent analyzes price trends and decides to buy gold or wait for a better price.
- Goal: Buy gold at the lowest possible price while reserving cash for crafting labor.
Phase 2: Warehouse (Product Selection)
- The agent sees demand levels for each product type:
| Product | Gold (oz) | Labor ($) | Demand Range |
|---|---|---|---|
| Ring | 1.0 | $200 | 40-100% |
| Necklace | 2.0 | $300 | 20-80% |
| Bracelet | 0.5 | $100 | 10-60% |
- The agent picks the highest-demand product it can afford to craft.
- Goal: Match production to market demand.
Phase 3: Showroom (Negotiation)
- A customer makes an initial offer based on cost basis and product demand.
- The agent can accept, counter-offer, or reject.
- Each counter raises the customer's offer by 5% (up to 5 rounds).
- Goal: Sell at maximum profit through smart negotiation.
Reward Structure
| Component | Weight | Description |
|---|---|---|
| R1 (Market) | 20% | How close to the lowest price did the agent buy? |
| R2 (Warehouse) | 20% | Did the agent pick the highest-demand product? |
| R3 (Showroom) | 60% | Normalized profit margin on the sale |
Final Score = 0.2 Γ R1 + 0.2 Γ R2 + 0.6 Γ R3 (range [0, 1])
Quick Start
from ShopManagerEng import JewelryAction, JewelryShopEnv
async def run():
env = JewelryShopEnv(base_url="http://localhost:8000")
result = await env.reset()
print(f"Gold price: ${result.observation.gold_price}/oz")
# Phase 1 β Market: wait for better price
result = await env.step(JewelryAction(market_action="wait"))
# Phase 1 β Market: buy gold
result = await env.step(JewelryAction(market_action="buy", gold_qty=2.0))
# Phase 2 β Warehouse: choose product
result = await env.step(JewelryAction(product_choice="ring"))
# Phase 3 β Showroom: negotiate
result = await env.step(JewelryAction(message="How about $600?"))
result = await env.step(JewelryAction(message="I accept"))
print(f"Final reward: {result.reward}, Cash: {result.observation.cash}")
await env.close()
import asyncio
asyncio.run(run())
Action Space
class JewelryAction:
market_action: str # "buy" or "wait" (Phase 1)
gold_qty: float # Ounces to buy (Phase 1)
product_choice: str # "ring", "necklace", or "bracelet" (Phase 2)
message: str # Negotiation text (Phase 3)
Observation Space
class JewelryObservation:
phase: str # "market" | "warehouse" | "showroom"
cash: float # Current cash balance
gold_oz: float # Raw gold in inventory
gold_price: float # Current gold price ($/oz)
gold_price_history: List[float] # Price trend for analysis
market_round: int # Current market round
demand: Dict[str, float] # Demand per product (0-1)
product_catalog: Dict[str, dict] # Specs per product
inventory: Dict[str, int] # Crafted products in stock
product_for_sale: str # Product being sold (showroom)
cost_basis: float # Total manufacturing cost
current_offer: float # Customer's current offer
negotiation_round: int # Counter-offer round
message: str # Environment feedback
Running the Inference Script
# Terminal 1: Start the server
cd ShopManagerEng
uv run server
# Terminal 2: Run inference (from parent directory or inside ShopManagerEng)
python inference.py
Required environment variables (set in .env):
HF_TOKENβ Hugging Face API tokenMODEL_NAMEβ LLM model (default:meta-llama/Llama-3.3-70B-Instruct)
Deploying to Hugging Face Spaces
openenv push
Project Structure
ShopManagerEng/
βββ __init__.py # Module exports
βββ README.md # This file
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Dependencies
βββ models.py # Action, Observation, State definitions
βββ client.py # JewelryShopEnv client
βββ inference.py # LLM-based agent inference script
βββ server/
βββ __init__.py
βββ ShopManagerEng_environment.py # Core environment logic
βββ app.py # FastAPI application
βββ Dockerfile # Container image
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support