Qwen3-8B Bargaining-Agent LoRA Adapters

LoRA adapters over Qwen/Qwen3-8B from the paper Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information (Miceli-Barone, Belle, Cohen; 2026).

Two LLM agents (a buyer and a seller) negotiate over a commodity across multiple rounds under varying information transparency. These adapters are the reinforcement-learning fine-tunes studied in the paper, trained with two on-policy losses (GRPO and CISPO) and a rank reward transform, with chain-of-thought reasoning disabled. The reward is the trained agent's normalised utility (zero if no deal is reached).

Each adapter lives in its own subfolder of this repository:

Subfolder	Role trained	Loss	Notes	LoRA
`buyer-grpo`	buyer (seller fixed at base)	GRPO	rank transform	r=16, α=32
`buyer-cispo`	buyer (seller fixed at base)	CISPO	rank transform	r=16, α=32
`buyer-grpo-norank`	buyer (seller fixed at base)	GRPO	no rank transform (ablation)	r=16, α=32
`buyer-cispo-norank`	buyer (seller fixed at base)	CISPO	no rank transform (ablation)	r=16, α=32
`seller-grpo`	seller (buyer fixed at base)	GRPO	large batch	r=16, α=32
`seller-cispo`	seller (buyer fixed at base)	CISPO	large batch	r=16, α=32
`joint-grpo`	both (shared adapter, self-play)	GRPO	large batch	r=32, α=64
`joint-cispo`	both (shared adapter, self-play)	CISPO	large batch	r=32, α=64

The buyer- and seller-side adapters update one role while the opponent stays at the base model. The joint adapters are a single LoRA shared by both roles, trained in self-play.

Usage

Load any variant by passing its subfolder:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="auto", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

model = PeftModel.from_pretrained(base, "AnvaMiba/qwen3-8b-bargaining-lora", subfolder="joint-grpo")

With vLLM, pass the adapter as a LoRA module pointing at the chosen subfolder.

Code and data

Code (training, evaluation, scenario generation): https://github.com/Avmb/llm-bargaining-agents
Bargaining-scenarios dataset: https://huggingface.co/datasets/AnvaMiba/llm-bargaining-scenarios

Citation

@misc{micelibarone2026usedcarsalesbots,
    title  = {Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information},
    author = {Antonio Valerio Miceli-Barone and Vaishak Belle and Shay B. Cohen},
    year   = {2026},
    eprint = {2605.31445},
    archivePrefix = {arXiv},
    primaryClass = {cs.GT},
    url = {https://arxiv.org/abs/2605.31445}
}

License

Released under the MIT License.

Downloads last month: -

Model tree for AnvaMiba/qwen3-8b-bargaining-lora

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1474)

this model

Paper for AnvaMiba/qwen3-8b-bargaining-lora

Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information

Paper • 2605.31445 • Published May 29