Instructions to use krishy-d/prosify_qwen_1.5b_lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use krishy-d/prosify_qwen_1.5b_lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct") model = PeftModel.from_pretrained(base_model, "krishy-d/prosify_qwen_1.5b_lora") - Notebooks
- Google Colab
- Kaggle
Prosify-Qwen-1.5B-LoRA (v1)
A LoRA adapter for Qwen/Qwen2.5-1.5B-Instruct trained with Direct Preference Optimization (DPO) on the FormatBench dataset to correct the systematic over-formatting bias of contemporary instruction-tuned LLMs.
Part of the Prosify project,grouped with the dataset in the Prosify Hugging Face Collection.
What this model does
Contemporary instruction-tuned LLMs (including Qwen 2.5) over-format their responses by default — producing bulleted lists, bold section headers, and templated structures even when flowing prose would serve the reader better. This adapter nudges the base model toward prose responses on contexts where prose is appropriate, while preserving the base model's ability to use structure where structure genuinely helps.
Quick start
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
BASE = "Qwen/Qwen2.5-1.5B-Instruct"
ADAPTER = "krishy-d/prosify_qwen_1.5b_lora"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
BASE, dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
# Generate
messages = [{"role": "user", "content": "Write me an email to my manager about WFH tomorrow."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=200, do_sample=False,
pad_token_id=tokenizer.pad_token_id)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training details
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Training method | DPO (Direct Preference Optimization) |
| Adapter | LoRA |
| Training data | FormatBench train split (~440 examples) |
| Validation | FormatBench val split (~50 examples) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Learning rate | 5e-5 |
| Epochs | 1 |
| β (KL leash) | 0.1 |
| Effective batch size | 4 |
| Precision | bfloat16 |
| Hardware | Kaggle T4 GPU (free tier) |
Full training notebook: notebooks/dpo_02_train.ipynb.
Evaluation
Structural metrics computed on the FormatBench held-out test split (49 examples, gold = prose) and the adversarial held-out set (40 examples, gold = structure).
Main test split (gold = prose)
| Metric | Base model | This model | Gold response |
|---|---|---|---|
| Bullets per response | 2.16 | 0.53 | 0.00 |
| Headers per response | 0.59 | 0.00 | 0.00 |
The trained adapter reduces bullet usage by 75% and eliminates markdown headers on prose-appropriate contexts.
Adversarial set (gold = structure)
| Metric | Base model | This model | Gold response |
|---|---|---|---|
| Bullets per response | 9.35 | 6.60 | 8.18 |
| Headers per response | 0.78 | 0.30 | 3.38 |
On contexts where structure is the correct answer (recipes, install instructions, comparisons, troubleshooting flows, reference lookups), the trained model preserves substantial structure but drifts below the gold level — indicating mild reward hacking where the model slightly over-generalizes the "prefer prose" preference.
Full evaluation notebook: notebooks/dpo_03_evaluate.ipynb.
Limitations
- v1 baseline: this is the first training run on a small dataset (591 examples) with conservative settings (LoRA rank 16, 1 epoch, β = 0.1). Higher capacity and tuned hyperparameters would likely close the adversarial gap.
- Single-author dataset voice: the FormatBench
chosenresponses were authored by one annotator, so the trained model inherits that voice as the "correct" prose style. - English only: training data is exclusively English.
- Mild reward hacking: the model uses less structure than appropriate on contexts where structure helps (~20% reduction below gold on adversarial bullets).
- Small base model: 1.5B parameters limits both fluency and the ceiling of what LoRA fine-tuning can achieve. v2 will explore 3B and 7B base models.
Roadmap
This adapter is v1. Planned for v2:
- Higher LoRA rank (64 or 128) for more capacity to shift generation behavior
- Tighter KL constraint (β = 0.3) to reduce adversarial structure drift
- Larger dataset (~2000 examples, multi-author)
- Larger base models (Qwen 2.5 3B and 7B)
Related artifacts
- Dataset: FormatBench
- HF Collection: Prosify
- Code & notebooks: github.com/krishyaid-coder/prosify
- Kaggle dataset: FormatBench on Kaggle
Citation
@misc{prosify_qwen_1_5b_lora_2026,
author = {Krishna Dahale},
title = {Prosify-Qwen-1.5B-LoRA: A DPO-trained adapter for correcting LLM formatting bias},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/krishy-d/prosify_qwen_1.5b_lora}}
}
License
Apache 2.0 (matching the base model's license).
- Downloads last month
- 39