Spaces:
Sleeping
NegotiateAI: Teaching LLMs to Win at Enterprise Procurement
Meta PyTorch OpenEnv Hackathon | April 2026
The Problem
Every procurement manager knows the feeling. You have 5 suppliers, 12 open requirements, a budget that is already stretched, and three deadlines hitting this week. You need to negotiate hard, but not so hard that the supplier walks. You need to defer some items, but not the critical ones. And you need to do all of this simultaneously, under pressure, with incomplete information.
Current LLMs cannot do this. They can write an email about negotiation. They can explain what a purchase order is. But put them in a live negotiation with real constraints and real consequences and they fall apart immediately.
We built NegotiateAI because we wanted to fix that.
The Environment
NegotiateAI is an adversarial procurement arena built on the OpenEnv framework. The agent steps into the shoes of a procurement manager. It sees a live dashboard of suppliers, requirements, budgets and deadlines. It chooses from seven real procurement actions:
| Action | Description |
|---|---|
negotiate |
Open or counter a price with a supplier |
award_contract |
Accept terms and lock in a supplier |
raise_pr |
Submit a formal purchase requisition |
defer |
Push a decision to the next planning cycle |
reject |
Walk away from a supplier |
hedge |
Split an order across two suppliers to reduce risk |
escalate |
Bring in senior management for high-stakes decisions |
Suppliers push back. Prices fluctuate. Deadlines expire. The agent lives with the consequences of every decision it makes.
The reward signal captures what actually matters in procurement: fulfilling critical requirements on time, staying within budget, and avoiding costly deadline failures. Three difficulty levels push the agent from structured scenarios all the way to full adversarial arena conditions.
The Training
We trained a Llama 3.2 3B model using GRPO (Group Relative Policy Optimisation) via HuggingFace TRL on an NVIDIA A100 80GB. Training data was collected live from the running environment across two difficulty levels β not from a static dataset.
Phase 1 β Easy Negotiation
200 episodes generated 1,333 training samples from real environment interactions. The reward function maintained a 513x separation between valid procurement actions (0.0513) and invalid ones (0.0001), giving GRPO a clear gradient signal.
The environment's curriculum engine advanced through all difficulty tiers naturally as performance improved:
- Episode 35: advanced to Apprentice
- Episode 59: advanced to Practitioner
- Episode 87: advanced to Expert (43% of episodes at Expert tier)
GRPO training improved reward from 0.0068 β 0.0073 (+8.3%) over 600 steps.
Rolling average reward across 200 episodes. Agent progressed Novice β Apprentice (ep 35) β Practitioner (ep 59) β Expert (ep 87).
Step-level rewards and rolling average during GRPO training on 1,333 training samples.
Phase 2 β Medium Adversarial
Following easy negotiation training, the model was exposed to medium_adversarial scenarios β 12 suppliers including deceptive agents, a rival buyer, and mid-game supply disruptions. 100 episodes generated 1,829 training samples for continued fine-tuning over 150 steps at learning rate 2.5e-06.
The Results
| Metric | Value |
|---|---|
| Training episodes (easy) | 200 |
| Training samples (easy) | 1,333 |
| Training episodes (medium) | 100 |
| Training samples (medium) | 1,829 |
| Model | Llama 3.2 3B + LoRA adapters |
| Training method | GRPO via HuggingFace TRL |
| Hardware | NVIDIA A100 80GB |
| Tier advancements | Novice β Apprentice β Practitioner β Expert |
| Expert tier episodes | 43% |
| Easy first 20 steps avg reward | 0.0068 |
| Easy last 20 steps avg reward | 0.0073 |
| Easy improvement | +8.3% |
| Valid action reward signal | 0.0513 vs 0.0001 (513x gap) |
Before vs After
| Behavior | Untrained | Trained |
|---|---|---|
| raise_pr (invalid) steps | 2/8 | 1/8 |
| Actions with proposed_price | 6/8 | 7/8 |
| Avg reward | 0.0104 | 0.0104 |
Why This Matters
Procurement is not a niche problem. It is a 50 trillion dollar global industry where decisions happen under pressure, with incomplete information, and real financial consequences. Most AI tools in this space are glorified search engines or document summarisers.
NegotiateAI is something different. It is a trainable, measurable, open benchmark for teaching LLMs to actually negotiate. Not to talk about negotiating. To do it.
The curriculum engine means the environment gets harder as the agent improves. The adversarial supplier LLMs mean there is no fixed optimal policy to memorise. And the OpenEnv interface means any model can be dropped in and evaluated on the same benchmark.
We think that distinction matters a lot.
Links
| Resource | URL |
|---|---|
| π€ HuggingFace Space | https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv |
| π Training Notebook | https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv/blob/main/NegotiateAI_Training.ipynb |
| π€ Trained Model | https://huggingface.co/prasanthdj8/negotiateai-procurement-agent |
| π» GitHub | https://github.com/Prasanthdj8/negotiateai-openenv |