Spaces:

prasanthdj8
/

negotiateai-openenv

Sleeping

App Files Files Community

negotiateai-openenv / Blog.md

prasanthdj8

Update Blog.md

b0f304f verified 12 days ago

preview code

raw

history blame contribute delete

5.44 kB

NegotiateAI: Teaching LLMs to Win at Enterprise Procurement

Meta PyTorch OpenEnv Hackathon | April 2026

The Problem

Every procurement manager knows the feeling. You have 5 suppliers, 12 open requirements, a budget that is already stretched, and three deadlines hitting this week. You need to negotiate hard, but not so hard that the supplier walks. You need to defer some items, but not the critical ones. And you need to do all of this simultaneously, under pressure, with incomplete information.

Current LLMs cannot do this. They can write an email about negotiation. They can explain what a purchase order is. But put them in a live negotiation with real constraints and real consequences and they fall apart immediately.

We built NegotiateAI because we wanted to fix that.

The Environment

NegotiateAI is an adversarial procurement arena built on the OpenEnv framework. The agent steps into the shoes of a procurement manager. It sees a live dashboard of suppliers, requirements, budgets and deadlines. It chooses from seven real procurement actions:

Action	Description
`negotiate`	Open or counter a price with a supplier
`award_contract`	Accept terms and lock in a supplier
`raise_pr`	Submit a formal purchase requisition
`defer`	Push a decision to the next planning cycle
`reject`	Walk away from a supplier
`hedge`	Split an order across two suppliers to reduce risk
`escalate`	Bring in senior management for high-stakes decisions

Suppliers push back. Prices fluctuate. Deadlines expire. The agent lives with the consequences of every decision it makes.

The reward signal captures what actually matters in procurement: fulfilling critical requirements on time, staying within budget, and avoiding costly deadline failures. Three difficulty levels push the agent from structured scenarios all the way to full adversarial arena conditions.

The Training

We trained a Llama 3.2 3B model using GRPO (Group Relative Policy Optimisation) via HuggingFace TRL on an NVIDIA A100 80GB. Training data was collected live from the running environment across two difficulty levels — not from a static dataset.

Phase 1 — Easy Negotiation

200 episodes generated 1,333 training samples from real environment interactions. The reward function maintained a 513x separation between valid procurement actions (0.0513) and invalid ones (0.0001), giving GRPO a clear gradient signal.

The environment's curriculum engine advanced through all difficulty tiers naturally as performance improved:

Episode 35: advanced to Apprentice
Episode 59: advanced to Practitioner
Episode 87: advanced to Expert (43% of episodes at Expert tier)

GRPO training improved reward from 0.0068 → 0.0073 (+8.3%) over 600 steps.

Rolling average reward across 200 episodes. Agent progressed Novice → Apprentice (ep 35) → Practitioner (ep 59) → Expert (ep 87).

Step-level rewards and rolling average during GRPO training on 1,333 training samples.

Phase 2 — Medium Adversarial

Following easy negotiation training, the model was exposed to medium_adversarial scenarios — 12 suppliers including deceptive agents, a rival buyer, and mid-game supply disruptions. 100 episodes generated 1,829 training samples for continued fine-tuning over 150 steps at learning rate 2.5e-06.

The Results

Metric	Value
Training episodes (easy)	200
Training samples (easy)	1,333
Training episodes (medium)	100
Training samples (medium)	1,829
Model	Llama 3.2 3B + LoRA adapters
Training method	GRPO via HuggingFace TRL
Hardware	NVIDIA A100 80GB
Tier advancements	Novice → Apprentice → Practitioner → Expert
Expert tier episodes	43%
Easy first 20 steps avg reward	0.0068
Easy last 20 steps avg reward	0.0073
Easy improvement	+8.3%
Valid action reward signal	0.0513 vs 0.0001 (513x gap)

Before vs After

Behavior	Untrained	Trained
raise_pr (invalid) steps	2/8	1/8
Actions with proposed_price	6/8	7/8
Avg reward	0.0104	0.0104

Why This Matters

Procurement is not a niche problem. It is a 50 trillion dollar global industry where decisions happen under pressure, with incomplete information, and real financial consequences. Most AI tools in this space are glorified search engines or document summarisers.

NegotiateAI is something different. It is a trainable, measurable, open benchmark for teaching LLMs to actually negotiate. Not to talk about negotiating. To do it.

The curriculum engine means the environment gets harder as the agent improves. The adversarial supplier LLMs mean there is no fixed optimal policy to memorise. And the OpenEnv interface means any model can be dropped in and evaluated on the same benchmark.

We think that distinction matters a lot.

Links

Resource	URL
🤗 HuggingFace Space	https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv
📓 Training Notebook	https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv/blob/main/NegotiateAI_Training.ipynb
🤖 Trained Model	https://huggingface.co/prasanthdj8/negotiateai-procurement-agent
💻 GitHub	https://github.com/Prasanthdj8/negotiateai-openenv