加入中文词表并继续预训练中文Embedding,并在此基础上继续使用指令数据集finetuning,得到的中文Alpaca-plus模型。
详情可参考:https://github.com/ymcui/Chinese-LLaMA-Alpaca/releases/tag/v3.0
使用方法参考
- 安装模块包
pip install sentencepiece
pip install transformers>=4.28.0
- 生成文本
import torch
import transformers
from transformers import LlamaTokenizer, LlamaForCausalLM
def generate_prompt(text):
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{text}
### Response:"""
tokenizer = LlamaTokenizer.from_pretrained('minlik/chinese-alpaca-plus-7b-merged')
model = LlamaForCausalLM.from_pretrained('minlik/chinese-alpaca-plus-7b-merged').half().to('cuda')
model.eval()
text = '第一个登上月球的人是谁?'
prompt = generate_prompt(text)
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')
with torch.no_grad():
output_ids = model.generate(
input_ids=input_ids,
max_new_tokens=128,
temperature=1,
top_k=40,
top_p=0.9,
repetition_penalty=1.15
).cuda()
output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(output.replace(prompt, '').strip())
- Downloads last month
- 1,314
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.