Qwen3.5-9B Humanize
Collection
Chinese text humanization model series: SFT + DPO training pipeline, models and datasets included. • 5 items • Updated • 1
How to use XiangJinYu/Qwen3.5-9B-Humanize-DPO-Round1 with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/models/unsloth-Qwen3.5-9B")
model = PeftModel.from_pretrained(base_model, "XiangJinYu/Qwen3.5-9B-Humanize-DPO-Round1")LoRA adapter fine-tuned with DPO (Direct Preference Optimization) on Qwen3.5-9B for Chinese text humanization. This is the first DPO alignment stage — balanced between academic precision and natural daily-language rewriting.
| Item | Value |
|---|---|
| Base model | unsloth/Qwen3.5-9B |
| Starting point | SFT |
| Fine-tuning method | DPO |
| LoRA rank | 16 |
| Training data | 4000 pairs (2000 over-formal + 2000 over-casual rejected) |
| Checkpoint used | step-200 (of 250) |
| Final margin | ~11.4 |
| Final accuracy | 100% |
Rewrites AI-generated or overly formal/casual Chinese text into natural human writing:
from unsloth import FastLanguageModel
from peft import PeftModel
base_model, proc = FastLanguageModel.from_pretrained(
"unsloth/Qwen3.5-9B", max_seq_length=2048, load_in_4bit=False,
)
tokenizer = proc.tokenizer if hasattr(proc, "tokenizer") else proc
model = PeftModel.from_pretrained(
base_model, "XiangJinYu/Qwen3.5-9B-Humanize-DPO-Round1", is_trainable=False,
)
if hasattr(model, "config") and getattr(model.config, "model_type", "") == "qwen3_5":
model.config.model_type = "qwen3"
FastLanguageModel.for_inference(model)
instruction = "请将下面文本改写得更像自然人写作,保持原意与事实,不要加标题或说明。"
text = "本研究旨在探讨深度学习模型在自然语言处理任务中的性能优化策略,实验结果表明BLEU分数提高了3.2个百分点。"
messages = [{"role": "user", "content": [{"type": "text", "text": f"{instruction}\n\n原文:{text}"}]}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.65,
top_p=0.9, do_sample=True, repetition_penalty=1.1)
gen = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(gen, skip_special_tokens=True))
| Model | Type | Recommended for |
|---|---|---|
| SFT | SFT | Foundation |
| This model | DPO | General use, balanced |
| DPO Round 2 | DPO | Academic/technical, latest |