Upload openllmplayground/openalpaca_7b_700bt_preview ctranslate fp16 weights

a8b5979 about 1 year ago

6.86 kB

	---
	tags:
	- ctranslate2
	- int8
	- float16

	license: apache-2.0
	---
	# # Fast-Inference with Ctranslate2
	Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.

	quantized version of [openllmplayground/openalpaca_7b_700bt_preview](https://huggingface.co/openllmplayground/openalpaca_7b_700bt_preview)
	```bash
	pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.14.0
	```
	Converted on 2023-06-02 using
	```
	ct2-transformers-converter --model openllmplayground/openalpaca_7b_700bt_preview --output_dir /home/michael/tmp-ct2fast-openalpaca_7b_700bt_preview --force --copy_files README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code
	```

	Checkpoint compatible to [ctranslate2>=3.14.0](https://github.com/OpenNMT/CTranslate2)
	and [hf-hub-ctranslate2>=2.0.8](https://github.com/michaelfeil/hf-hub-ctranslate2)
	- `compute_type=int8_float16` for `device="cuda"`
	- `compute_type=int8` for `device="cpu"`

	```python
	from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
	from transformers import AutoTokenizer

	model_name = "michaelfeil/ct2fast-openalpaca_7b_700bt_preview"
	# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
	model = GeneratorCT2fromHfHub(
	# load in int8 on CUDA
	model_name_or_path=model_name,
	device="cuda",
	compute_type="int8_float16",
	# tokenizer=AutoTokenizer.from_pretrained("openllmplayground/openalpaca_7b_700bt_preview")
	)
	outputs = model.generate(
	text=["def fibonnaci(", "User: How are you doing? Bot:"],
	max_length=64,
	include_prompt_in_result=False
	)
	print(outputs)
	```

	# Licence and other remarks:
	This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.

	# Original description


	# OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA

	In this repo, we release a permissively licensed open-source instruction-following model based on [OpenLLaMA](https://github.com/openlm-research/open_llama). In this release, we release a public preview of the 7B OpenAlpaca model based on [the previewed version of OpenLLaMA](https://huggingface.co/openlm-research/open_llama_7b_700bt_preview) that is a 7B model trained with 700 billion tokens. We provide PyTorch weights of OpenAlpaca. Stay tuned for our forthcoming updates!

	[Project Page] [(https://github.com/yxuansu/OpenAlpaca)](https://github.com/yxuansu/OpenAlpaca)

	# Dataset and Training

	We train our model on the [dolly 15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) released by Databricks. The training configurations are provided in the table below. The training takes on 8 x A100(40G) GPUs and lasts for around 30 minutes.

	\|\|\|
	\|:-------------:\|:-------------:\|
	\|Batch Size\|64\|
	\|Learning rate\|2e-5\|
	\|Epochs\|3\|
	\|Max length\|1024\|



	# Example Usage

	Below shows an example on how to use OpenAlpaca

	```python
	import torch
	from transformers import LlamaForCausalLM, LlamaTokenizer

	# the previewed version of OpenAlpaca
	model_path = r'openllmplayground/openalpaca_7b_700bt_preview'
	tokenizer = LlamaTokenizer.from_pretrained(model_path)
	model = LlamaForCausalLM.from_pretrained(model_path).cuda()
	tokenizer.bos_token_id, tokenizer.eos_token_id = 1,2 # see https://github.com/openlm-research/open_llama#preview-weights-release-and-usage

	# same prompt as provided in https://crfm.stanford.edu/2023/03/13/alpaca.html
	instruction = r'What is an alpaca? How is it different from a llama?'
	'''
	instruction = r'Write an e-mail to congratulate new Standford admits and mention that you are excited about meeting all of them in person.'
	instruction = r'What is the capital of Tanzania?'
	instruction = r'Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks.'
	'''

	prompt_no_input = f'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:'
	tokens = tokenizer.encode(prompt_no_input)

	tokens = torch.LongTensor(tokens).unsqueeze(0)
	instance = {'input_ids': tokens,
	'top_k': 50,
	'top_p': 0.9,
	'generate_len': 128}

	length = len(tokens[0])
	with torch.no_grad():
	rest = model.generate(
	input_ids=tokens,
	max_length=length+instance['generate_len'],
	use_cache=True,
	do_sample=True,
	top_p=instance['top_p'],
	top_k=instance['top_k']
	)

	output = rest[0][length:]
	string = tokenizer.decode(output, skip_special_tokens=True)
	print(f'[!] Generation results: {string}')
	```

	# License and Usage

	OpenAlpaca is permissively licensed under the Apache 2.0 license and can be used freely for academic/commercial purposes.


	# Contact
	We would love to get feedback from the community. If you have any questions, please open an issue or contact us.

	OpenAlpaca is developed by: [Yixuan Su](https://yxuansu.github.io/)<sup>\</sup>, [Tian Lan](https://github.com/gmftbyGMFTBY)<sup>\</sup>, and [Deng Cai](https://jcyk.github.io/) (The first two members<sup>\*</sup> contributed equally.)

	# Reference:

	If you found OpenAlpaca useful in your research or applications, please kindly cite using the following BibTeX:
	```
	@misc{openalpaca,
	author = {Yixuan Su and Tian Lan and Deng Cai},
	title = {OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA},
	year = {2023},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/yxuansu/OpenAlpaca}},
	}
	```
	```
	@software{openlm2023openllama,
	author = {Xinyang Geng and Hao Liu},
	title = {OpenLLaMA: An Open Reproduction of LLaMA},
	month = May,
	year = 2023,
	url = {https://github.com/openlm-research/open_llama}
	}
	```
	```
	@misc{alpaca,
	author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
	title = {Stanford Alpaca: An Instruction-following LLaMA model},
	year = {2023},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
	}
	```
	```
	@article{touvron2023llama,
	title={Llama: Open and efficient foundation language models},
	author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie{-}Anne Lachaux and Timoth{\'{e}}e Lacroix and Baptiste Rozi{\`{e}}re and Naman Goyal and Eric Hambro and Faisal Azhar and Aur{\'{e}}lien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample},
	journal={arXiv preprint arXiv:2302.13971},
	year={2023}
	}
	```