--- tags: - ctranslate2 - int8 - float16 license: apache-2.0 --- # # Fast-Inference with Ctranslate2 Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU. quantized version of [openllmplayground/openalpaca_7b_700bt_preview](https://huggingface.co/openllmplayground/openalpaca_7b_700bt_preview) ```bash pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.14.0 ``` Converted on 2023-06-02 using ``` ct2-transformers-converter --model openllmplayground/openalpaca_7b_700bt_preview --output_dir /home/michael/tmp-ct2fast-openalpaca_7b_700bt_preview --force --copy_files README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code ``` Checkpoint compatible to [ctranslate2>=3.14.0](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2>=2.0.8](https://github.com/michaelfeil/hf-hub-ctranslate2) - `compute_type=int8_float16` for `device="cuda"` - `compute_type=int8` for `device="cpu"` ```python from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub from transformers import AutoTokenizer model_name = "michaelfeil/ct2fast-openalpaca_7b_700bt_preview" # use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model. model = GeneratorCT2fromHfHub( # load in int8 on CUDA model_name_or_path=model_name, device="cuda", compute_type="int8_float16", # tokenizer=AutoTokenizer.from_pretrained("openllmplayground/openalpaca_7b_700bt_preview") ) outputs = model.generate( text=["def fibonnaci(", "User: How are you doing? Bot:"], max_length=64, include_prompt_in_result=False ) print(outputs) ``` # Licence and other remarks: This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo. # Original description # OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA In this repo, we release a permissively licensed open-source instruction-following model based on [OpenLLaMA](https://github.com/openlm-research/open_llama). In this release, we release a public preview of the 7B OpenAlpaca model based on [the previewed version of OpenLLaMA](https://huggingface.co/openlm-research/open_llama_7b_700bt_preview) that is a 7B model trained with 700 billion tokens. We provide PyTorch weights of OpenAlpaca. Stay tuned for our forthcoming updates! **[Project Page]** [(https://github.com/yxuansu/OpenAlpaca)](https://github.com/yxuansu/OpenAlpaca) # Dataset and Training We train our model on the [dolly 15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) released by Databricks. The training configurations are provided in the table below. The training takes on 8 x A100(40G) GPUs and lasts for around 30 minutes. ||| |:-------------:|:-------------:| |**Batch Size**|64| |**Learning rate**|2e-5| |**Epochs**|3| |**Max length**|1024| # Example Usage Below shows an example on how to use OpenAlpaca ```python import torch from transformers import LlamaForCausalLM, LlamaTokenizer # the previewed version of OpenAlpaca model_path = r'openllmplayground/openalpaca_7b_700bt_preview' tokenizer = LlamaTokenizer.from_pretrained(model_path) model = LlamaForCausalLM.from_pretrained(model_path).cuda() tokenizer.bos_token_id, tokenizer.eos_token_id = 1,2 # see https://github.com/openlm-research/open_llama#preview-weights-release-and-usage # same prompt as provided in https://crfm.stanford.edu/2023/03/13/alpaca.html instruction = r'What is an alpaca? How is it different from a llama?' ''' instruction = r'Write an e-mail to congratulate new Standford admits and mention that you are excited about meeting all of them in person.' instruction = r'What is the capital of Tanzania?' instruction = r'Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks.' ''' prompt_no_input = f'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:' tokens = tokenizer.encode(prompt_no_input) tokens = torch.LongTensor(tokens).unsqueeze(0) instance = {'input_ids': tokens, 'top_k': 50, 'top_p': 0.9, 'generate_len': 128} length = len(tokens[0]) with torch.no_grad(): rest = model.generate( input_ids=tokens, max_length=length+instance['generate_len'], use_cache=True, do_sample=True, top_p=instance['top_p'], top_k=instance['top_k'] ) output = rest[0][length:] string = tokenizer.decode(output, skip_special_tokens=True) print(f'[!] Generation results: {string}') ``` # License and Usage OpenAlpaca is permissively licensed under the Apache 2.0 license and can be used freely for academic/commercial purposes. # Contact We would love to get feedback from the community. If you have any questions, please open an issue or contact us. OpenAlpaca is developed by: [Yixuan Su](https://yxuansu.github.io/)\*, [Tian Lan](https://github.com/gmftbyGMFTBY)\*, and [Deng Cai](https://jcyk.github.io/) (The first two members\* contributed equally.) # Reference: If you found OpenAlpaca useful in your research or applications, please kindly cite using the following BibTeX: ``` @misc{openalpaca, author = {Yixuan Su and Tian Lan and Deng Cai}, title = {OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/yxuansu/OpenAlpaca}}, } ``` ``` @software{openlm2023openllama, author = {Xinyang Geng and Hao Liu}, title = {OpenLLaMA: An Open Reproduction of LLaMA}, month = May, year = 2023, url = {https://github.com/openlm-research/open_llama} } ``` ``` @misc{alpaca, author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, title = {Stanford Alpaca: An Instruction-following LLaMA model}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}}, } ``` ``` @article{touvron2023llama, title={Llama: Open and efficient foundation language models}, author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie{-}Anne Lachaux and Timoth{\'{e}}e Lacroix and Baptiste Rozi{\`{e}}re and Naman Goyal and Eric Hambro and Faisal Azhar and Aur{\'{e}}lien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample}, journal={arXiv preprint arXiv:2302.13971}, year={2023} } ```