Qwen3.5-9B Humanize SFT

A LoRA adapter fine-tuned on Qwen3.5-9B for Chinese text humanization — rewriting AI-generated Chinese text to sound more like natural human writing, while preserving the original meaning and factual content.

This is the SFT foundation for the humanization model series. All DPO versions build on this adapter.

Model Details

Item	Value
Base model	`unsloth/Qwen3.5-9B`
Fine-tuning method	SFT (supervised fine-tuning)
LoRA rank	16
Training data	~18k Chinese academic text pairs (CSL corpus)
Training steps	~~900 steps (~~0.8 epoch)
Final loss	~0.82

Usage

from unsloth import FastLanguageModel
from peft import PeftModel

base_model, proc = FastLanguageModel.from_pretrained(
    "unsloth/Qwen3.5-9B", max_seq_length=2048, load_in_4bit=False,
)
tokenizer = proc.tokenizer if hasattr(proc, "tokenizer") else proc

model = PeftModel.from_pretrained(
    base_model, "XiangJinYu/Qwen3.5-9B-Humanize-SFT", is_trainable=False,
)
if hasattr(model, "config") and getattr(model.config, "model_type", "") == "qwen3_5":
    model.config.model_type = "qwen3"
FastLanguageModel.for_inference(model)

instruction = "请将下面文本改写得更像自然人写作，保持原意与事实，不要加标题或说明。"
text = "本研究旨在探讨深度学习模型在自然语言处理任务中的性能优化策略。"
messages = [{"role": "user", "content": [{"type": "text", "text": f"{instruction}\n\n原文：{text}"}]}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.65,
                         top_p=0.9, do_sample=True, repetition_penalty=1.1)
gen = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(gen, skip_special_tokens=True))

Training Details

Dataset: ~18k CSL academic paper pairs; chosen = human-written abstract, rejected = AI-rewritten version
Optimizer: AdamW 8-bit, lr=2e-4, cosine decay
Hardware: NVIDIA RTX 5090 (32GB), Unsloth + TRL