Qwen2.5-7B-Instruct-CharacterEnhance

中文

基于 Qwen2.5-7B-Instruct 进行 QLoRA 微调的角色扮演对话模型,支持中英双语。训练数据为 PIPPA 数据集。

模型能够在角色扮演对话中生成自然、符合角色设定的回复。

快速开始

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="Qwen2.5-7B-Instruct-CharacterEnhance",
    device="cuda",
)

messages = [
    {
        "role": "system",
        "content": "You are a helpful role-play assistant. Respond in character based on the given persona and conversation history."
    },
    {
        "role": "user",
        "content": "现在需要你来扮演角色并继续角色和用户之间的闲聊...\n\n<|角色信息-开始|>\n[你扮演的角色的角色信息]\n姓名:小明,性格开朗的大学生\n\n[用户信息]\n朋友\n<|角色信息-结束|>\n\n<|对话上文-开始|>\nuser: 周末一起去爬山吗?\nassistant: (眼睛一亮)好啊好啊!我最近正想出去走走呢。\n<|对话上文-结束|>"
    }
]

output = generator(messages, max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

训练参数

参数
基座模型 Qwen2.5-7B-Instruct
训练方法 QLoRA (4-bit NF4 quantization)
LoRA Rank (r) 8
LoRA Alpha 16
LoRA Dropout 0
最大序列长度 2048
Epochs 1
Batch Size 2
梯度累积 1
学习率 5e-5
学习率调度 Cosine with 3% warmup
优化器 AdamW 8-bit
随机种子 13
训练样本 3,044 (1,522 EN + 1,522 ZH)
总步数 1,446
最终 Eval Loss 1.9628
硬件 RTX 5080 16GB
训练耗时 ~45 分钟

训练数据

PIPPA 是一个大规模人机角色扮演对话数据集。训练样本从 16,832 条去重对话中提取,经过质量过滤和分层采样。同时使用了英文原文和中文译文来构建双语训练集。

偏见与局限

  • 模型继承了 PIPPA 数据集和基座模型 Qwen2.5-7B-Instruct 中存在的偏见。
  • 训练 prompt 将回复限制在约 30 字以内,不适合长文本生成。
  • 对于训练数据之外的人设或场景,角色一致性可能下降。

框架版本

  • Transformers: 5.12.1
  • PEFT: 0.19.1
  • TRL: 1.6.0
  • PyTorch: 2.11.0+cu128
  • Datasets: 5.0.0
  • Tokenizers: 0.22.2

许可

本模型继承自 Qwen2.5-7B-Instruct 的 Apache 2.0 许可。

引用

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

English

A QLoRA fine-tuned version of Qwen2.5-7B-Instruct for bilingual (English & Chinese) character role-play dialogue generation. Trained on the PIPPA dataset.

The model generates natural, character-consistent responses in role-play conversations.

Quick Start

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="Qwen2.5-7B-Instruct-CharacterEnhance",
    device="cuda",
)

messages = [
    {
        "role": "system",
        "content": "You are a helpful role-play assistant. Respond in character based on the given persona and conversation history."
    },
    {
        "role": "user",
        "content": "Now, you are required to role-play and continue the casual chat...\n\n<|Character information-begin|>\n[Character information of the character you play]\nName: Alex, a cheerful college student\n\n[User information]\nFriend\n<|Character information-end|>\n\n<|Dialogue context-begin|>\nuser: Want to go hiking this weekend?\nassistant: (Eyes light up) Yes! I've been wanting to get outdoors lately.\n<|Dialogue context-end|>"
    }
]

output = generator(messages, max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training Details

Parameter Value
Base Model Qwen2.5-7B-Instruct
Training Method QLoRA (4-bit NF4 quantization)
LoRA Rank (r) 8
LoRA Alpha 16
LoRA Dropout 0
Max Sequence Length 2048
Epochs 1
Batch Size 2
Gradient Accumulation 1
Learning Rate 5e-5
LR Schedule Cosine with 3% warmup
Optimizer AdamW 8-bit
Seed 13
Training Samples 3,044 (1,522 EN + 1,522 ZH)
Total Steps 1,446
Final Eval Loss 1.9628
Hardware RTX 5080 16GB
Training Time ~45 minutes

Training Data

PIPPA (Personal Interaction Pairs between People and AI), a large-scale dataset of human-AI role-play dialogues. Training samples were extracted from 16,832 deduplicated dialogues with quality filtering and stratified sampling. Both English originals and Chinese translations were used to create a bilingual training set.

Bias & Limitations

  • The model inherits biases present in the PIPPA dataset and the base Qwen2.5-7B-Instruct model.
  • Responses are constrained to ~30 Chinese characters (or equivalent) by the training prompt, making it unsuitable for long-form generation.
  • Character consistency may degrade with out-of-distribution personas or scenarios.

Framework Versions

  • Transformers: 5.12.1
  • PEFT: 0.19.1
  • TRL: 1.6.0
  • PyTorch: 2.11.0+cu128
  • Datasets: 5.0.0
  • Tokenizers: 0.22.2

License

This model inherits Apache 2.0 from Qwen2.5-7B-Instruct.

Citation

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CyberpunkLegend/Qwen2.5-7B-Instruct-CharacterEnhance

Base model

Qwen/Qwen2.5-7B
Finetuned
(2619)
this model