Instructions to use lebiraja/customer-support-grpo-v5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lebiraja/customer-support-grpo-v5 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("lebiraja/customer-support-grpo-v5") model = AutoModelForCausalLM.from_pretrained("lebiraja/customer-support-grpo-v5") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use lebiraja/customer-support-grpo-v5 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lebiraja/customer-support-grpo-v5 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lebiraja/customer-support-grpo-v5 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for lebiraja/customer-support-grpo-v5 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="lebiraja/customer-support-grpo-v5", max_seq_length=2048, )
customer-support-grpo-v5
A hierarchical multi-agent Reinforcement Learning model trained with GRPO for realistic customer support scenarios.
This model was developed as part of the Meta OpenEnv Hackathon Round 2 (April 2026).
Model Description
customer-support-grpo-v5 is a fine-tuned Llama-3.1-8B model trained using Unsloth + GRPO (Group Relative Policy Optimization). It powers a 3-level hierarchical multi-agent system designed to simulate and improve real-world customer support in Indian enterprise environments.
Key Features
- Hierarchical Agents: L1 Support Agent, L2 Supervisor, L3 Manager with escalation logic
- Progressive Curriculum Learning: 5 stages from basic to nightmare difficulty
- Hybrid Reward System:
- Rule-based: VADER sentiment, efficiency, accuracy
- LLM-as-Judge: empathy, policy adherence, resolution quality
- Grounded Responses: NoSQL DB integration for user/order context
- Realistic Challenges:
- Policy drift mid-conversation
- SLA pressure
- Hinglish users
- Multi-turn coordination
Training Details
- Base Model:
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit - Method: GRPO with LoRA
- Hardware: Hugging Face Jobs (A100 / L40S)
- Steps: 150
- Training Time: ~7 hours
- Framework: Unsloth + TRL + custom rollout + hybrid reward engine
- Date: April 26, 2026
This was the 5th training attempt after multiple failures (credits, timeouts, infra issues).
Intended Use
- Customer support simulation
- Multi-agent coordination experiments
- Instruction-following research
- Long-horizon reasoning
Live Demo:
https://huggingface.co/spaces/lebiraja/customer-support-env
How to Use
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="lebiraja/customer-support-grpo-v5",
device="cuda",
torch_dtype="auto"
)
messages = [
{"role": "system", "content": "You are a professional customer support agent..."},
{"role": "user", "content": "I was charged twice for my order ORD-EC-1202"}
]
response = pipe(messages, max_new_tokens=512, temperature=0.7)
print(response[0]["generated_text"][-1]["content"])
Use the system prompt from the training environment for best results.
Performance Highlights
- Strong reward improvement across curriculum
- Effective hierarchical coordination
- Reduced hallucinations via grounding + penalties
- Good handling of Hinglish + frustrated users
Limitations
- Trained on simulated data (not production)
- Context length constrained
- May produce invalid actions in edge cases
- Requires correct system prompt
Repository Links
- Live Demo: https://huggingface.co/spaces/lebiraja/customer-support-env
- Full Project: https://github.com/lebiraja/meta_hack
- Previous Versions: v2, v3, v4
License
Apache-2.0
Citation
@misc{customer-support-grpo-v5,
author = {Lebi Raja and team},
title = {customer-support-grpo-v5: Hierarchical Multi-Agent RL for Customer Support},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/lebiraja/customer-support-grpo-v5}},
note = {Meta OpenEnv Hackathon Round 2}
}
Built with ❤️ using Unsloth, Hugging Face, and a lot of late-night debugging.
- Downloads last month
- 217
Model tree for lebiraja/customer-support-grpo-v5
Base model
meta-llama/Llama-3.1-8B