Qwen3-8B Bargaining-Agent LoRA Adapters

LoRA adapters over Qwen/Qwen3-8B from the paper Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information (Miceli-Barone, Belle, Cohen; 2026).

Two LLM agents (a buyer and a seller) negotiate over a commodity across multiple rounds under varying information transparency. These adapters are the reinforcement-learning fine-tunes studied in the paper, trained with two on-policy losses (GRPO and CISPO) and a rank reward transform, with chain-of-thought reasoning disabled. The reward is the trained agent's normalised utility (zero if no deal is reached).

Each adapter lives in its own subfolder of this repository:

Subfolder Role trained Loss Notes LoRA
buyer-grpo buyer (seller fixed at base) GRPO rank transform r=16, α=32
buyer-cispo buyer (seller fixed at base) CISPO rank transform r=16, α=32
buyer-grpo-norank buyer (seller fixed at base) GRPO no rank transform (ablation) r=16, α=32
buyer-cispo-norank buyer (seller fixed at base) CISPO no rank transform (ablation) r=16, α=32
seller-grpo seller (buyer fixed at base) GRPO large batch r=16, α=32
seller-cispo seller (buyer fixed at base) CISPO large batch r=16, α=32
joint-grpo both (shared adapter, self-play) GRPO large batch r=32, α=64
joint-cispo both (shared adapter, self-play) CISPO large batch r=32, α=64

The buyer- and seller-side adapters update one role while the opponent stays at the base model. The joint adapters are a single LoRA shared by both roles, trained in self-play.

Usage

Load any variant by passing its subfolder:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="auto", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

model = PeftModel.from_pretrained(base, "AnvaMiba/qwen3-8b-bargaining-lora", subfolder="joint-grpo")

With vLLM, pass the adapter as a LoRA module pointing at the chosen subfolder.

Code and data

Citation

@misc{micelibarone2026usedcarsalesbots,
    title  = {Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information},
    author = {Antonio Valerio Miceli-Barone and Vaishak Belle and Shay B. Cohen},
    year   = {2026},
    eprint = {2605.31445},
    archivePrefix = {arXiv},
    primaryClass = {cs.GT},
    url = {https://arxiv.org/abs/2605.31445}
}

License

Released under the MIT License.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AnvaMiba/qwen3-8b-bargaining-lora

Finetuned
Qwen/Qwen3-8B
Adapter
(1474)
this model

Paper for AnvaMiba/qwen3-8b-bargaining-lora