yjf9966 commited on
Commit
0fea662
1 Parent(s): 2893f85

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +185 -0
README.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - BAAI/COIG-PC
5
+ language:
6
+ - zh
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+ This is an experimental product that can be used to create new LLM bassed on Chinese language.
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+ - **Developed by:** yjf9966
24
+ - **Model type:** LLaMA with enhanced tokenizer-size-49954
25
+ - **Language(s) (NLP):** Chinese/English
26
+ - **License:** Apache-2.0
27
+ - **Finetuned from model:** [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
28
+
29
+ ### Model Sources [optional]
30
+
31
+ <!-- Provide the basic links for the model. -->
32
+
33
+ - **Repository:** https://huggingface.co/AntX-ai/AntX-13B
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ You can use the raw model for next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task.
40
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering.
41
+
42
+
43
+ ## Bias, Risks, and Limitations
44
+
45
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
46
+
47
+ Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions.
48
+ It also inherits some of the bias of its dataset model.
49
+
50
+ ### Recommendations
51
+
52
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
53
+
54
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
55
+
56
+ ## How to Get Started with the Model
57
+
58
+ Use the code below to get started with the model.
59
+
60
+ ```python
61
+ from transformers import LlamaForCausalLM, LlamaTokenizer
62
+ import torch
63
+
64
+ base_model_name = "AntX-ai/AntX-13B"
65
+ load_type = torch.float16
66
+ device = None
67
+
68
+ generation_config = dict(
69
+ temperature=0.2,
70
+ top_k=40,
71
+ top_p=0.9,
72
+ do_sample=True,
73
+ num_beams=1,
74
+ repetition_penalty=1.3,
75
+ max_new_tokens=400
76
+ )
77
+
78
+ prompt_input = (
79
+ "Below is an instruction that describes a task. "
80
+ "Write a response that appropriately completes the request.\n\n"
81
+ "### Instruction:\n\n{instruction}\n\n### Response:\n\n"
82
+ )
83
+ if torch.cuda.is_available():
84
+ device = torch.device(0)
85
+ else:
86
+ device = torch.device('cpu')
87
+
88
+ def generate_prompt(instruction, input=None):
89
+ if input:
90
+ instruction = instruction + '\n' + input
91
+ return prompt_input.format_map({'instruction': instruction})
92
+
93
+ tokenizer = LlamaTokenizer.from_pretrained(base_model_name)
94
+ model = LlamaForCausalLM.from_pretrained(
95
+ base_model_name,
96
+ load_in_8bit=False,
97
+ torch_dtype=load_type,
98
+ low_cpu_mem_usage=True,
99
+ device_map='auto',
100
+ )
101
+
102
+ model_vocab_size = model.get_input_embeddings().weight.size(0)
103
+ tokenzier_vocab_size = len(tokenizer)
104
+ if model_vocab_size != tokenzier_vocab_size:
105
+ model.resize_token_embeddings(tokenzier_vocab_size)
106
+
107
+ raw_input_text = input("Input:")
108
+ input_text = generate_prompt(instruction=raw_input_text)
109
+ inputs = tokenizer(input_text, return_tensors="pt")
110
+ generation_output = model.generate(
111
+ input_ids=inputs["input_ids"].to(device),
112
+ attention_mask=inputs['attention_mask'].to(device),
113
+ eos_token_id=tokenizer.eos_token_id,
114
+ pad_token_id=tokenizer.pad_token_id,
115
+ **generation_config
116
+ )
117
+ s = generation_output[0]
118
+ output = tokenizer.decode(s, skip_special_tokens=True)
119
+ response = output.split("### Response:")[1].strip()
120
+ print("Response: ", response)
121
+ print("\n")
122
+ ```
123
+
124
+
125
+ ## Training Details
126
+
127
+
128
+ ### Training Procedure
129
+
130
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
131
+
132
+ #### Preprocessing [optional]
133
+
134
+ 80% for train dataset and 20% for test dataset
135
+
136
+
137
+ #### Training Hyperparameters
138
+
139
+ - **Training regime:** fp16 mixed precision, lr=1e-4, lora_rank=8, lora_alpha=32 <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
140
+
141
+
142
+ ## Evaluation
143
+
144
+ #### Testing Data
145
+
146
+ <!-- This should link to a Data Card if possible. -->
147
+ 20% of the BAAI/COIG-PC dataset.
148
+
149
+ ```
150
+ Input:王国维说:“自周之衰,文王、周公势力之瓦解也,国民之智力成熟于内,政治之纷乱乘之于外,上无统一之制度,下迫于社会之要求,于是诸于九流各创其学说。” 他意在说明 A. 分封制的崩溃 B. 商鞅变法的作用 C. 兼并战争的后果 D. 百家争鸣的原因
151
+ Response: 本题考查对材料的理解。A错误;B正确;C和D与材料无关。故选BC两项即可
152
+
153
+
154
+ Input:经济基础是指��定社会发展阶段占统治地位的生产关系各个方面的总和。解决了地方经济问题, 也就解 决了地方割据问题, 为此, 宋太祖采取的措施是( ) A . 地方设转运使, 财赋收归中央 B . 派文臣担任各地州县的长官 C . 派文臣管理地方政事 D . 采取分化事权办法, 削弱相权
155
+ Response: A: 本题考查对宋太祖治下地方问题的认识。 A : 依据材料可知, 在北宋时期 , 由于地主阶级的发展壮大以及商业、手工业等新兴行业出现 , 这一时期出现的地方割据现象严重威胁着国家的统一与稳定 . 因此 , 为了解决这个问题, 需要加强中央集权 ; 故选 A
156
+
157
+
158
+ Input:go\\n/**\\n * Definition for a binary tree node.\\n * type TreeNode struct {\\n * Val int\\n * Left *TreeNode\\n * Right *TreeNode\\n * }\\n */\\nfunc maxDepth(root *TreeNode) int {\\n\\tif root == nil {\\n\\t\\treturn 0\\n\\t}\\n\\tl, r := maxDepth(root.Left), maxDepth(root.Right)\\n\\treturn 1 + max(l, r)\\n}\\n\\nfunc max(a, b int) int {\\n\\tif a > b {\\n\\t\\treturn a\\n\\t}\\n\\treturn b\\n}\\n
159
+ Response: ```python
160
+ def max_depth(root): # The function takes in a single parameter `root` and returns its maximum depth value as output.
161
+ if not root or len(root.children()) == 0:
162
+ return 0
163
+ l = max_depth(root.left)
164
+ r = max_depth(root.right)
165
+ return 1 + (max(l, r))
166
+
167
+ class TreeNode:
168
+ def __init__(self, val=None, left=10, right=0):
169
+ self.val = val
170
+ self.left = None
171
+ self.right = None
172
+
173
+
174
+ ```
175
+
176
+ ## Citation
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+ ```
180
+ @software{AntX-13B,
181
+ title={An Enchanced Chinese Language Model based on the Chinese-LLaMA-Alpaca},
182
+ url={https://huggingface.co/AntX-ai/AntX-13B},
183
+ year={2023}
184
+ }
185
+ ```