Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,185 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- BAAI/COIG-PC
|
5 |
+
language:
|
6 |
+
- zh
|
7 |
+
library_name: transformers
|
8 |
+
pipeline_tag: text-generation
|
9 |
+
---
|
10 |
+
|
11 |
+
# Model Card for Model ID
|
12 |
+
|
13 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
14 |
+
|
15 |
+
This is an experimental product that can be used to create new LLM bassed on Chinese language.
|
16 |
+
## Model Details
|
17 |
+
|
18 |
+
### Model Description
|
19 |
+
|
20 |
+
<!-- Provide a longer summary of what this model is. -->
|
21 |
+
|
22 |
+
|
23 |
+
- **Developed by:** yjf9966
|
24 |
+
- **Model type:** LLaMA with enhanced tokenizer-size-49954
|
25 |
+
- **Language(s) (NLP):** Chinese/English
|
26 |
+
- **License:** Apache-2.0
|
27 |
+
- **Finetuned from model:** [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
|
28 |
+
|
29 |
+
### Model Sources [optional]
|
30 |
+
|
31 |
+
<!-- Provide the basic links for the model. -->
|
32 |
+
|
33 |
+
- **Repository:** https://huggingface.co/AntX-ai/AntX-13B
|
34 |
+
|
35 |
+
## Uses
|
36 |
+
|
37 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
38 |
+
|
39 |
+
You can use the raw model for next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task.
|
40 |
+
Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering.
|
41 |
+
|
42 |
+
|
43 |
+
## Bias, Risks, and Limitations
|
44 |
+
|
45 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
46 |
+
|
47 |
+
Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions.
|
48 |
+
It also inherits some of the bias of its dataset model.
|
49 |
+
|
50 |
+
### Recommendations
|
51 |
+
|
52 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
53 |
+
|
54 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
55 |
+
|
56 |
+
## How to Get Started with the Model
|
57 |
+
|
58 |
+
Use the code below to get started with the model.
|
59 |
+
|
60 |
+
```python
|
61 |
+
from transformers import LlamaForCausalLM, LlamaTokenizer
|
62 |
+
import torch
|
63 |
+
|
64 |
+
base_model_name = "AntX-ai/AntX-13B"
|
65 |
+
load_type = torch.float16
|
66 |
+
device = None
|
67 |
+
|
68 |
+
generation_config = dict(
|
69 |
+
temperature=0.2,
|
70 |
+
top_k=40,
|
71 |
+
top_p=0.9,
|
72 |
+
do_sample=True,
|
73 |
+
num_beams=1,
|
74 |
+
repetition_penalty=1.3,
|
75 |
+
max_new_tokens=400
|
76 |
+
)
|
77 |
+
|
78 |
+
prompt_input = (
|
79 |
+
"Below is an instruction that describes a task. "
|
80 |
+
"Write a response that appropriately completes the request.\n\n"
|
81 |
+
"### Instruction:\n\n{instruction}\n\n### Response:\n\n"
|
82 |
+
)
|
83 |
+
if torch.cuda.is_available():
|
84 |
+
device = torch.device(0)
|
85 |
+
else:
|
86 |
+
device = torch.device('cpu')
|
87 |
+
|
88 |
+
def generate_prompt(instruction, input=None):
|
89 |
+
if input:
|
90 |
+
instruction = instruction + '\n' + input
|
91 |
+
return prompt_input.format_map({'instruction': instruction})
|
92 |
+
|
93 |
+
tokenizer = LlamaTokenizer.from_pretrained(base_model_name)
|
94 |
+
model = LlamaForCausalLM.from_pretrained(
|
95 |
+
base_model_name,
|
96 |
+
load_in_8bit=False,
|
97 |
+
torch_dtype=load_type,
|
98 |
+
low_cpu_mem_usage=True,
|
99 |
+
device_map='auto',
|
100 |
+
)
|
101 |
+
|
102 |
+
model_vocab_size = model.get_input_embeddings().weight.size(0)
|
103 |
+
tokenzier_vocab_size = len(tokenizer)
|
104 |
+
if model_vocab_size != tokenzier_vocab_size:
|
105 |
+
model.resize_token_embeddings(tokenzier_vocab_size)
|
106 |
+
|
107 |
+
raw_input_text = input("Input:")
|
108 |
+
input_text = generate_prompt(instruction=raw_input_text)
|
109 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
110 |
+
generation_output = model.generate(
|
111 |
+
input_ids=inputs["input_ids"].to(device),
|
112 |
+
attention_mask=inputs['attention_mask'].to(device),
|
113 |
+
eos_token_id=tokenizer.eos_token_id,
|
114 |
+
pad_token_id=tokenizer.pad_token_id,
|
115 |
+
**generation_config
|
116 |
+
)
|
117 |
+
s = generation_output[0]
|
118 |
+
output = tokenizer.decode(s, skip_special_tokens=True)
|
119 |
+
response = output.split("### Response:")[1].strip()
|
120 |
+
print("Response: ", response)
|
121 |
+
print("\n")
|
122 |
+
```
|
123 |
+
|
124 |
+
|
125 |
+
## Training Details
|
126 |
+
|
127 |
+
|
128 |
+
### Training Procedure
|
129 |
+
|
130 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
131 |
+
|
132 |
+
#### Preprocessing [optional]
|
133 |
+
|
134 |
+
80% for train dataset and 20% for test dataset
|
135 |
+
|
136 |
+
|
137 |
+
#### Training Hyperparameters
|
138 |
+
|
139 |
+
- **Training regime:** fp16 mixed precision, lr=1e-4, lora_rank=8, lora_alpha=32 <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
140 |
+
|
141 |
+
|
142 |
+
## Evaluation
|
143 |
+
|
144 |
+
#### Testing Data
|
145 |
+
|
146 |
+
<!-- This should link to a Data Card if possible. -->
|
147 |
+
20% of the BAAI/COIG-PC dataset.
|
148 |
+
|
149 |
+
```
|
150 |
+
Input:王国维说:“自周之衰,文王、周公势力之瓦解也,国民之智力成熟于内,政治之纷乱乘之于外,上无统一之制度,下迫于社会之要求,于是诸于九流各创其学说。” 他意在说明 A. 分封制的崩溃 B. 商鞅变法的作用 C. 兼并战争的后果 D. 百家争鸣的原因
|
151 |
+
Response: 本题考查对材料的理解。A错误;B正确;C和D与材料无关。故选BC两项即可
|
152 |
+
|
153 |
+
|
154 |
+
Input:经济基础是指��定社会发展阶段占统治地位的生产关系各个方面的总和。解决了地方经济问题, 也就解 决了地方割据问题, 为此, 宋太祖采取的措施是( ) A . 地方设转运使, 财赋收归中央 B . 派文臣担任各地州县的长官 C . 派文臣管理地方政事 D . 采取分化事权办法, 削弱相权
|
155 |
+
Response: A: 本题考查对宋太祖治下地方问题的认识。 A : 依据材料可知, 在北宋时期 , 由于地主阶级的发展壮大以及商业、手工业等新兴行业出现 , 这一时期出现的地方割据现象严重威胁着国家的统一与稳定 . 因此 , 为了解决这个问题, 需要加强中央集权 ; 故选 A
|
156 |
+
|
157 |
+
|
158 |
+
Input:go\\n/**\\n * Definition for a binary tree node.\\n * type TreeNode struct {\\n * Val int\\n * Left *TreeNode\\n * Right *TreeNode\\n * }\\n */\\nfunc maxDepth(root *TreeNode) int {\\n\\tif root == nil {\\n\\t\\treturn 0\\n\\t}\\n\\tl, r := maxDepth(root.Left), maxDepth(root.Right)\\n\\treturn 1 + max(l, r)\\n}\\n\\nfunc max(a, b int) int {\\n\\tif a > b {\\n\\t\\treturn a\\n\\t}\\n\\treturn b\\n}\\n
|
159 |
+
Response: ```python
|
160 |
+
def max_depth(root): # The function takes in a single parameter `root` and returns its maximum depth value as output.
|
161 |
+
if not root or len(root.children()) == 0:
|
162 |
+
return 0
|
163 |
+
l = max_depth(root.left)
|
164 |
+
r = max_depth(root.right)
|
165 |
+
return 1 + (max(l, r))
|
166 |
+
|
167 |
+
class TreeNode:
|
168 |
+
def __init__(self, val=None, left=10, right=0):
|
169 |
+
self.val = val
|
170 |
+
self.left = None
|
171 |
+
self.right = None
|
172 |
+
|
173 |
+
|
174 |
+
```
|
175 |
+
|
176 |
+
## Citation
|
177 |
+
|
178 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
179 |
+
```
|
180 |
+
@software{AntX-13B,
|
181 |
+
title={An Enchanced Chinese Language Model based on the Chinese-LLaMA-Alpaca},
|
182 |
+
url={https://huggingface.co/AntX-ai/AntX-13B},
|
183 |
+
year={2023}
|
184 |
+
}
|
185 |
+
```
|