yuyijiong
/

LongAlpaca-7b-32k-chinese

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LongAlpaca-7b-32k-chinese / README.md

yuyijiong's picture

Update README.md

68346fe 9 months ago

|

No virus

2.12 kB

	---
	license: cc-by-nc-4.0
	datasets:
	- yuyijiong/Long-Instruction-Chinese
	language:
	- zh
	- en
	pipeline_tag: text-generation
	---
	* [LongAlpaca](https://huggingface.co/Yukang/LongAlpaca-7B)通过对 llama2-chat 进行少量长文本数据的微调，展现出了优秀的长文本对话能力。
	* LongAlpaca-7b-chinese 和 LongAlpaca 使用类似的训练方法：先使用线性位置插值，然后通过少量长文本数据的微调，使其获得优秀的长文本对话能力。
	* 使用的数据集与LongAlpaca较为类似，但增加了多文档问答的数据。
	* 此模型由atom-7b-chat经过lora微调得到，通过线性位置插值，将文本长度从4k扩展到32k，可以完成上万字的多文档检索、论文总结等任务，已经能满足绝大部分需要，而短对话能力几乎没有下降。\
	使用方法：
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from transformers.generation import GenerationConfig
	import os
	os.environ["CUDA_VISIBLE_DEVICES"] = "0"

	model_path="yuyijiong/LongAlpaca-7b-chinese"
	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

	# use auto mode, automatically select precision based on the device.
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", load_in_8bit=True).eval()


	question="中国的首都是什么？"
	input_text = "<s>Human: " + question + "\n</s><s>Assistant: "
	input_ids = tokenizer(input_text, return_tensors='pt').input_ids.to(model.device)

	with torch.no_grad():
	with torch.autocast('cuda'):
	output = model.generate(input_ids=input_ids,
	max_new_tokens=max_new_tokens,
	do_sample=True,
	temperature=0.85,
	top_k=None,
	top_p=0.9,
	use_cache=True,
	**kwargs)

	reply = tokenizer.decode(output[0], skip_special_tokens=False)
	reply_return=reply.split('Assistant:')[-1].replace('</s>', '')

	print('模型回答：', reply_return)
	```