Instructions to use AnvaMiba/qwen3-8b-bargaining-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AnvaMiba/qwen3-8b-bargaining-lora with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Qwen3-8B Bargaining-Agent LoRA Adapters
LoRA adapters over Qwen/Qwen3-8B from the paper
Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial
Information (Miceli-Barone, Belle, Cohen; 2026).
Two LLM agents (a buyer and a seller) negotiate over a commodity across multiple rounds under varying information transparency. These adapters are the reinforcement-learning fine-tunes studied in the paper, trained with two on-policy losses (GRPO and CISPO) and a rank reward transform, with chain-of-thought reasoning disabled. The reward is the trained agent's normalised utility (zero if no deal is reached).
Each adapter lives in its own subfolder of this repository:
| Subfolder | Role trained | Loss | Notes | LoRA |
|---|---|---|---|---|
buyer-grpo |
buyer (seller fixed at base) | GRPO | rank transform | r=16, α=32 |
buyer-cispo |
buyer (seller fixed at base) | CISPO | rank transform | r=16, α=32 |
buyer-grpo-norank |
buyer (seller fixed at base) | GRPO | no rank transform (ablation) | r=16, α=32 |
buyer-cispo-norank |
buyer (seller fixed at base) | CISPO | no rank transform (ablation) | r=16, α=32 |
seller-grpo |
seller (buyer fixed at base) | GRPO | large batch | r=16, α=32 |
seller-cispo |
seller (buyer fixed at base) | CISPO | large batch | r=16, α=32 |
joint-grpo |
both (shared adapter, self-play) | GRPO | large batch | r=32, α=64 |
joint-cispo |
both (shared adapter, self-play) | CISPO | large batch | r=32, α=64 |
The buyer- and seller-side adapters update one role while the opponent stays at the base model. The joint adapters are a single LoRA shared by both roles, trained in self-play.
Usage
Load any variant by passing its subfolder:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="auto", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "AnvaMiba/qwen3-8b-bargaining-lora", subfolder="joint-grpo")
With vLLM, pass the adapter as a LoRA module pointing at the chosen subfolder.
Code and data
- Code (training, evaluation, scenario generation): https://github.com/Avmb/llm-bargaining-agents
- Bargaining-scenarios dataset: https://huggingface.co/datasets/AnvaMiba/llm-bargaining-scenarios
Citation
@misc{micelibarone2026usedcarsalesbots,
title = {Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information},
author = {Antonio Valerio Miceli-Barone and Vaishak Belle and Shay B. Cohen},
year = {2026},
eprint = {2605.31445},
archivePrefix = {arXiv},
primaryClass = {cs.GT},
url = {https://arxiv.org/abs/2605.31445}
}
License
Released under the MIT License.
- Downloads last month
- -