Link of the environment: https://huggingface.co/spaces/hard007ik/ShopManagerEng Link of Blog.md: https://huggingface.co/spaces/hard007ik/ShopManagerEng/tree/main/Blog.md

Jewelry Shop Manager β€” RL Environment

A reinforcement learning environment simulating a jewelry shop management pipeline. An AI agent navigates three sequential phases β€” buying raw materials, selecting products to craft based on demand, and negotiating sales β€” to maximize profit.

Environment Overview

Phase 1: Market (Buy / Wait)

  • Gold prices fluctuate Β±10% each round (up to 3 rounds).
  • The agent analyzes price trends and decides to buy gold or wait for a better price.
  • Goal: Buy gold at the lowest possible price while reserving cash for crafting labor.

Phase 2: Warehouse (Product Selection)

  • The agent sees demand levels for each product type:
Product Gold (oz) Labor ($) Demand Range
Ring 1.0 $200 40-100%
Necklace 2.0 $300 20-80%
Bracelet 0.5 $100 10-60%
  • The agent picks the highest-demand product it can afford to craft.
  • Goal: Match production to market demand.

Phase 3: Showroom (Negotiation)

  • A customer makes an initial offer based on cost basis and product demand.
  • The agent can accept, counter-offer, or reject.
  • Each counter raises the customer's offer by 5% (up to 5 rounds).
  • Goal: Sell at maximum profit through smart negotiation.

Reward Structure

Component Weight Description
R1 (Market) 20% How close to the lowest price did the agent buy?
R2 (Warehouse) 20% Did the agent pick the highest-demand product?
R3 (Showroom) 60% Normalized profit margin on the sale

Final Score = 0.2 Γ— R1 + 0.2 Γ— R2 + 0.6 Γ— R3 (range [0, 1])

Quick Start

from ShopManagerEng import JewelryAction, JewelryShopEnv

async def run():
    env = JewelryShopEnv(base_url="http://localhost:8000")

    result = await env.reset()
    print(f"Gold price: ${result.observation.gold_price}/oz")

    # Phase 1 β€” Market: wait for better price
    result = await env.step(JewelryAction(market_action="wait"))

    # Phase 1 β€” Market: buy gold
    result = await env.step(JewelryAction(market_action="buy", gold_qty=2.0))

    # Phase 2 β€” Warehouse: choose product
    result = await env.step(JewelryAction(product_choice="ring"))

    # Phase 3 β€” Showroom: negotiate
    result = await env.step(JewelryAction(message="How about $600?"))
    result = await env.step(JewelryAction(message="I accept"))

    print(f"Final reward: {result.reward}, Cash: {result.observation.cash}")
    await env.close()

import asyncio
asyncio.run(run())

Action Space

class JewelryAction:
    market_action:  str   # "buy" or "wait" (Phase 1)
    gold_qty:       float # Ounces to buy (Phase 1)
    product_choice: str   # "ring", "necklace", or "bracelet" (Phase 2)
    message:        str   # Negotiation text (Phase 3)

Observation Space

class JewelryObservation:
    phase:              str          # "market" | "warehouse" | "showroom"
    cash:               float        # Current cash balance
    gold_oz:            float        # Raw gold in inventory
    gold_price:         float        # Current gold price ($/oz)
    gold_price_history: List[float]  # Price trend for analysis
    market_round:       int          # Current market round
    demand:             Dict[str, float]  # Demand per product (0-1)
    product_catalog:    Dict[str, dict]   # Specs per product
    inventory:          Dict[str, int]    # Crafted products in stock
    product_for_sale:   str          # Product being sold (showroom)
    cost_basis:         float        # Total manufacturing cost
    current_offer:      float        # Customer's current offer
    negotiation_round:  int          # Counter-offer round
    message:            str          # Environment feedback

Running the Inference Script

# Terminal 1: Start the server
cd ShopManagerEng
uv run server

# Terminal 2: Run inference (from parent directory or inside ShopManagerEng)
python inference.py

Required environment variables (set in .env):

  • HF_TOKEN β€” Hugging Face API token
  • MODEL_NAME β€” LLM model (default: meta-llama/Llama-3.3-70B-Instruct)

Deploying to Hugging Face Spaces

openenv push

Project Structure

ShopManagerEng/
β”œβ”€β”€ __init__.py            # Module exports
β”œβ”€β”€ README.md              # This file
β”œβ”€β”€ openenv.yaml           # OpenEnv manifest
β”œβ”€β”€ pyproject.toml         # Dependencies
β”œβ”€β”€ models.py              # Action, Observation, State definitions
β”œβ”€β”€ client.py              # JewelryShopEnv client
β”œβ”€β”€ inference.py           # LLM-based agent inference script
└── server/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ ShopManagerEng_environment.py  # Core environment logic
    β”œβ”€β”€ app.py             # FastAPI application
    └── Dockerfile         # Container image
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support