Qwen3.5-9B Humanize
Collection
Chinese text humanization model series: SFT + DPO training pipeline, models and datasets included. • 5 items • Updated • 1
How to use XiangJinYu/Qwen3.5-9B-Humanize-DPO-Round2 with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/models/unsloth-Qwen3.5-9B")
model = PeftModel.from_pretrained(base_model, "XiangJinYu/Qwen3.5-9B-Humanize-DPO-Round2")LoRA adapter fine-tuned with DPO on Qwen3.5-9B for Chinese text humanization. This is the latest and most capable version, especially strong on academic and technical Chinese text.
Uses a pure self-play approach: the rejected samples are outputs from an intermediate checkpoint of the previous training stage, teaching the model to consistently surpass its own prior output.
| Item | Value |
|---|---|
| Base model | unsloth/Qwen3.5-9B |
| Starting point | Intermediate checkpoint from prior DPO stage |
| Fine-tuning method | DPO (self-play rejected) |
| LoRA rank | 16 |
| Training data | 2000 pairs (chosen = CSL human text, rejected = prior checkpoint outputs) |
| Training steps | 250 steps (2 epochs) |
| Final loss | 0.34 |
| Final margin | ~1.5-2.5 |
| Final accuracy | ~93-100% |
from unsloth import FastLanguageModel
from peft import PeftModel
base_model, proc = FastLanguageModel.from_pretrained(
"unsloth/Qwen3.5-9B", max_seq_length=2048, load_in_4bit=False,
)
tokenizer = proc.tokenizer if hasattr(proc, "tokenizer") else proc
model = PeftModel.from_pretrained(
base_model, "XiangJinYu/Qwen3.5-9B-Humanize-DPO-Round2", is_trainable=False,
)
if hasattr(model, "config") and getattr(model.config, "model_type", "") == "qwen3_5":
model.config.model_type = "qwen3"
FastLanguageModel.for_inference(model)
instruction = "请将下面文本改写得更像自然人写作,保持原意与事实,不要加标题或说明。"
text = "本文提出了一种基于U-Net改进的医学影像分割方法,Dice系数达到0.923,较基线方法提升了4.7个百分点,推理速度提升约30%。"
messages = [{"role": "user", "content": [{"type": "text", "text": f"{instruction}\n\n原文:{text}"}]}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Recommended: temperature 0.60-0.65 for academic texts
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.65,
top_p=0.9, do_sample=True, repetition_penalty=1.1)
gen = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(gen, skip_special_tokens=True))
Tested on 10 academic scenarios (3 samples each), all key numbers preserved:
| Scenario | Numbers verified |
|---|---|
| NLP paper abstract | BLEU +3.2%, complexity -15% |
| Medical image segmentation | Dice 0.923, +4.7%, speed +30% |
| Graph neural network | O(n log n), F1 +2.8%, time -40% |
| SPWM inverter | 83.9%, 81.9%, 6~18V, IEC 61000-4-2 |
| Embedded system test | ±0.5LSB, 8ms, 28mW, 0.6mW |
Note: Use temperature 0.60-0.65 for academic texts. Higher temperatures occasionally cause rare technical term substitutions.
| Model | Type | Recommended for |
|---|---|---|
| SFT | SFT | Foundation |
| DPO Round 1 | DPO | General use, balanced |
| This model | DPO | Academic/technical, latest |