Prosify-Qwen-1.5B-LoRA (v1)

A LoRA adapter for Qwen/Qwen2.5-1.5B-Instruct trained with Direct Preference Optimization (DPO) on the FormatBench dataset to correct the systematic over-formatting bias of contemporary instruction-tuned LLMs.

Part of the Prosify project,grouped with the dataset in the Prosify Hugging Face Collection.

What this model does

Contemporary instruction-tuned LLMs (including Qwen 2.5) over-format their responses by default — producing bulleted lists, bold section headers, and templated structures even when flowing prose would serve the reader better. This adapter nudges the base model toward prose responses on contexts where prose is appropriate, while preserving the base model's ability to use structure where structure genuinely helps.

Quick start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

BASE = "Qwen/Qwen2.5-1.5B-Instruct"
ADAPTER = "krishy-d/prosify_qwen_1.5b_lora"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE, dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base_model, ADAPTER)

# Generate
messages = [{"role": "user", "content": "Write me an email to my manager about WFH tomorrow."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=200, do_sample=False,
                            pad_token_id=tokenizer.pad_token_id)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training details


Base model	Qwen/Qwen2.5-1.5B-Instruct
Training method	DPO (Direct Preference Optimization)
Adapter	LoRA
Training data	FormatBench train split (~440 examples)
Validation	FormatBench val split (~50 examples)
LoRA rank	16
LoRA alpha	32
Target modules	q_proj, k_proj, v_proj, o_proj
Learning rate	5e-5
Epochs	1
β (KL leash)	0.1
Effective batch size	4
Precision	bfloat16
Hardware	Kaggle T4 GPU (free tier)

Full training notebook: notebooks/dpo_02_train.ipynb.

Evaluation

Structural metrics computed on the FormatBench held-out test split (49 examples, gold = prose) and the adversarial held-out set (40 examples, gold = structure).

Main test split (gold = prose)

Metric	Base model	This model	Gold response
Bullets per response	2.16	0.53	0.00
Headers per response	0.59	0.00	0.00

The trained adapter reduces bullet usage by 75% and eliminates markdown headers on prose-appropriate contexts.

Adversarial set (gold = structure)

Metric	Base model	This model	Gold response
Bullets per response	9.35	6.60	8.18
Headers per response	0.78	0.30	3.38

On contexts where structure is the correct answer (recipes, install instructions, comparisons, troubleshooting flows, reference lookups), the trained model preserves substantial structure but drifts below the gold level — indicating mild reward hacking where the model slightly over-generalizes the "prefer prose" preference.

Full evaluation notebook: notebooks/dpo_03_evaluate.ipynb.

Limitations

v1 baseline: this is the first training run on a small dataset (591 examples) with conservative settings (LoRA rank 16, 1 epoch, β = 0.1). Higher capacity and tuned hyperparameters would likely close the adversarial gap.
Single-author dataset voice: the FormatBench chosen responses were authored by one annotator, so the trained model inherits that voice as the "correct" prose style.
English only: training data is exclusively English.
Mild reward hacking: the model uses less structure than appropriate on contexts where structure helps (~20% reduction below gold on adversarial bullets).
Small base model: 1.5B parameters limits both fluency and the ceiling of what LoRA fine-tuning can achieve. v2 will explore 3B and 7B base models.

Roadmap

This adapter is v1. Planned for v2:

Higher LoRA rank (64 or 128) for more capacity to shift generation behavior
Tighter KL constraint (β = 0.3) to reduce adversarial structure drift
Larger dataset (~2000 examples, multi-author)
Larger base models (Qwen 2.5 3B and 7B)

Related artifacts

Dataset: FormatBench
HF Collection: Prosify
Code & notebooks: github.com/krishyaid-coder/prosify
Kaggle dataset: FormatBench on Kaggle

Citation

@misc{prosify_qwen_1_5b_lora_2026,
  author = {Krishna Dahale},
  title = {Prosify-Qwen-1.5B-LoRA: A DPO-trained adapter for correcting LLM formatting bias},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/krishy-d/prosify_qwen_1.5b_lora}}
}

License

Apache 2.0 (matching the base model's license).

Downloads last month: 39

Model tree for krishy-d/prosify_qwen_1.5b_lora

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(1027)

this model

Dataset used to train krishy-d/prosify_qwen_1.5b_lora

Collection including krishy-d/prosify_qwen_1.5b_lora

Prosify: Correcting Formatting Bias in LLMs

Collection

An end-to-end research project on correcting the systematic over-formatting habit of RLHF-trained language models. • 2 items • Updated 2 days ago