Henry94 commited on
Commit
8a1a74c
1 Parent(s): f0646a9

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - TigerResearch/pretrain_zh
4
+ base_model:
5
+ - Qwen/Qwen2.5-14B
6
+ tags:
7
+ - character
8
+ - generation
9
+ license: apache-2.0
10
+ ---
11
+ **Qwen2.5-14B-Character**
12
+
13
+ **Introduction:**
14
+
15
+ **Qwen2.5-14B-Character** is the Character version of [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) model. It is developed based on the [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) model. It is specifically designed for character-to-character transformation and generation tasks.
16
+
17
+ **Core Contributions:**
18
+
19
+ 1. **Modified Token Vocabulary:** The original model's token vocabulary has been revised to remove tokens representing phrases and multiple characters. This refinement enhances the model's focus on individual character processing.
20
+
21
+ 2. **Continued Pre-training:** Based on the modified vocabulary, the model has undergone further pre-training to optimize its performance and adaptability for character-level tasks.
22
+
23
+
24
+ **Training Dataset:**
25
+
26
+ The model has been trained using the `TigerResearch/pretrain_zh` dataset, a comprehensive Chinese pre-training dataset provided by **TigerResearch**. For more information about the dataset, please visit: [TigerResearch/pretrain_zh](https://huggingface.co/datasets/TigerResearch/pretrain_zh).
27
+
28
+
29
+ **Training Code:**
30
+
31
+ The training process for this model was facilitated by the **LLaMA-Factory**, an open-source project that provides tools and frameworks for training language models. The LLaMa-factory codebase is available at: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
32
+
33
+
34
+ **Results**
35
+
36
+ To assess the efficacy of the Qwen2.5-14B-Character, we evaluated its performance on three widely utilized benchmarks: C-Evel, CMMLU, and MMLU. The results are tabulated as follows:
37
+
38
+ | Model | ceval| cmmlu| mmlu|
39
+ | :---: | :---: | :---: | :---: |
40
+ | Qwen2.5-14B | 85.29| 85.84| 79.86|
41
+ | Qwen2.5-14B-filter | 83.43| 83.72| 79.75|
42
+ | Qwen2.5-14B-Character | 84.99| 84.60| 79.61|
43
+
44
+ In the table, to discern the model performance more distinctly, we have presented the test results for both the original Qwen2.5-14B (Qwen2.5-14B) and the token-modified Qwen2.5-14B (Qwen2.5-14B-filter).
45
+
46
+
47
+ **Quickstart**
48
+
49
+ The latest version of transformers is recommended (at least 4.37.0). Here we show a code snippet to show you how to use the chat model with transformers:
50
+
51
+ ```shell
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
53
+
54
+ model_name = 'Henry94/Qwen2.5-14B-Character'
55
+
56
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
57
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
58
+
59
+
60
+ prompt = "请简单介绍一下大型语言模型."
61
+ messages = [
62
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
63
+ {"role": "user", "content": prompt}
64
+ ]
65
+ text = tokenizer.apply_chat_template(
66
+ messages,
67
+ tokenize=False,
68
+ add_generation_prompt=True
69
+ )
70
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
71
+
72
+ generated_ids = model.generate(
73
+ **model_inputs,
74
+ max_new_tokens=512
75
+ )
76
+ generated_ids = [
77
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
78
+ ]
79
+
80
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
81
+
82
+ print(response)
83
+ ```