Qwen2.5-3B-Character

Introduction:

Qwen2.5-3B-Character is the Character version of Qwen2.5-3B model. It is developed based on the Qwen2.5-3B model. It is specifically designed for character-to-character transformation and generation tasks.

Core Contributions:

  1. Modified Token Vocabulary: The original model's token vocabulary has been revised to remove tokens representing phrases and multiple characters. This refinement enhances the model's focus on individual character processing.

  2. Continued Pre-training: Based on the modified vocabulary, the model has undergone further pre-training to optimize its performance and adaptability for character-level tasks.

Training Dataset:

The model has been trained using the TigerResearch/pretrain_zh dataset, a comprehensive Chinese pre-training dataset provided by TigerResearch. For more information about the dataset, please visit: TigerResearch/pretrain_zh.

Training Code:

The training process for this model was facilitated by the LLaMA-Factory, an open-source project that provides tools and frameworks for training language models. The LLaMa-factory codebase is available at: LLaMA-Factory.

Results

To assess the efficacy of the Qwen2.5-3B-Character, we evaluated its performance on three widely utilized benchmarks: C-Evel, CMMLU, and MMLU. The results are tabulated as follows:

Model ceval cmmlu mmlu
Qwen2.5-3B 74.37 74.94 65.87
Qwen2.5-3B-filter 70.43 69.69 65.53
Qwen2.5-3B-Character 71.97 71.94 65.18

In the table, to discern the model performance more distinctly, we have presented the test results for both the original Qwen2.5-3B (Qwen2.5-3B) and the token-modified Qwen2.5-3B (Qwen2.5-3B-filter).

Quickstart

The latest version of transformers is recommended (at least 4.37.0). Here we show a code snippet to show you how to use the chat model with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name = 'Henry94/Qwen2.5-3B-Character'

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")


prompt = "请简单介绍一下大型语言模型."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)
Downloads last month
9
Safetensors
Model size
3.05B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for Henry94/Qwen2.5-3B-Character

Base model

Qwen/Qwen2.5-3B
Finetuned
(25)
this model

Dataset used to train Henry94/Qwen2.5-3B-Character