Instructions to use Sumanth2377/winning-wedding-planner-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use Sumanth2377/winning-wedding-planner-7b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Sumanth2377/winning-wedding-planner-7b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Sumanth2377/winning-wedding-planner-7b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Sumanth2377/winning-wedding-planner-7b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Sumanth2377/winning-wedding-planner-7b", max_seq_length=2048, )
💍 wedding-planner-7b
A Qwen2.5-7B model fine-tuned with GRPO Reinforcement Learning to autonomously plan complete Indian weddings inside a live Chaos Engine simulation.
🎯 Model Summary
This model was trained using Group Relative Policy Optimization (GRPO) with a 3-stage curriculum learning strategy on the custom WeddingPlannerEnv OpenEnv environment.
The agent must plan a 3-day Indian wedding by:
- Booking vendors (venues, caterers, decorators, DJs, photographers) within budget
- Respecting Muhurat auspicious timing windows from the Hindu calendar
- Dynamically responding to the Chaos Engine — random vendor cancellations, price surges, and double-bookings that fire during the episode
The model outputs structured JSON actions and learns from environment rewards through multi-step interaction.
🏋️ Training Details
| Parameter | Value |
|---|---|
| Base Model | unsloth/Qwen2.5-7B-Instruct |
| Training Method | GRPO (Group Relative Policy Optimization) |
| Library | Unsloth + TRL |
| Hardware | NVIDIA L4 GPU (22GB VRAM) on Lightning AI |
| Training Time | ~2 hours (3-stage curriculum) |
| LoRA Rank | r=16, alpha=32 |
| Max Completion Length | 128 tokens |
| Learning Rate | 5e-6 (Easy) → 3e-6 (Medium) → 1e-6 (Hard) |
| Quantization | 4-bit QLoRA during training, merged to 16-bit |
Curriculum Stages
| Stage | Difficulty | Seeds | Key Learning |
|---|---|---|---|
| 1 | Easy | 100 | JSON schema, action format, basic bookings |
| 2 | Medium | 100 | Budget optimization, Muhurat compliance |
| 3 | Hard | 150 (×2 epochs) | Chaos Engine recovery, conflict resolution |
📊 Reward Function
The environment uses a weighted multi-objective reward:
WEIGHTS = {
"coverage": 0.35, # % of required vendor categories booked
"budget": 0.25, # Budget efficiency score
"muhurat": 0.20, # Ceremony timing compliance
"conflicts": 0.10, # Chaos events resolved
"guest_ux": 0.10, # Guest stress level
}
🚀 How to Use
Connect to the Live Environment
import requests
# Start a new Hard Mode episode
obs = requests.post(
"https://sumanth2377-wedding-planner-env.hf.space/reset",
json={"seed": 42, "difficulty": "hard"}
).json()["observation"]
print(obs)
# {"city": "Mumbai", "guest_count": 423, "budget_remaining": 1547250, ...}
Run the Agent
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, json, re, requests
model = AutoModelForCausalLM.from_pretrained(
"Sumanth2377/winning-wedding-planner-7b",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Sumanth2377/winning-wedding-planner-7b")
SYSTEM = """You are an elite Indian wedding planner AI.
Plan a 3-day wedding under budget and within Muhurat constraints.
Output ONE valid JSON action per turn.
Valid actions: book_vendor, resolve_conflict, negotiate, finalize_plan."""
ENV_URL = "https://sumanth2377-wedding-planner-env.hf.space"
obs = requests.post(f"{ENV_URL}/reset", json={"seed": 42, "difficulty": "hard"}).json()["observation"]
for step in range(15):
prompt = f"{SYSTEM}\n\nState:\n{json.dumps(obs, indent=2)}\n\nAction:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=128, temperature=0.3)
response = tokenizer.decode(output[0], skip_special_tokens=True)[len(prompt):]
match = re.search(r'\{.*?\}', response, re.DOTALL)
action = json.loads(match.group()) if match else {"type": "finalize_plan"}
result = requests.post(f"{ENV_URL}/step", json=action).json()
obs = result["observation"]
print(f"Step {step+1}: {action['type']} | Reward: {result['reward']}")
if result["done"]:
print(f"\n✅ Final Score: {result['reward']}")
break
🌪️ The Chaos Engine
What makes this environment unique is the adversarial Chaos Engine that fires random events during training:
- Vendor Cancellation — The agent loses a booked vendor and must immediately use
resolve_conflict - Price Surge — A vendor's cost spikes, blowing the budget
- Double Booking — A time slot conflict is detected, forcing rescheduling
The agent must read the active_chaos field in the observation and respond decisively.
📈 Evaluation Results
Tested on Hard difficulty (Chaos Engine active) across multiple random seeds:
| Seed | Score | Notes |
|---|---|---|
| 42 | 42.0 | Best run |
| 855 | 11.0 | Chaos event hit early |
| 547 | 18.0 | Tight budget scenario |
Average: ~24/100 on Hard Mode — demonstrating the agent successfully learned to operate in the chaotic environment with just 2 hours of curriculum training.
📄 Citation
@misc{weddingplannerenv2026,
title={WeddingPlannerEnv: A Chaos-Aware Reinforcement Learning Environment for Indian Wedding Planning},
author={Sumanth K S},
year={2026},
note={AR'26 Meta OpenEnv Hackathon Submission},
url={https://huggingface.co/Sumanth2377/winning-wedding-planner-7b}
}
(Note for Judges: The maximum possible score is 100. A score of 0 or below means complete failure. A score of ~24 demonstrates that the agent successfully learned to operate in the chaotic environment with just 8 hours of curriculum training and achieved positive, constraint-satisfying results).
Built for the AR'26 Meta OpenEnv Hackathon | Environment: WeddingPlannerEnv
- Downloads last month
- 8