--- license: llama2 language: - ja - en --- ## Model Overview This model is a compiled version of [Watashiha-Llama-2-13B-Ogiri-sft](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft) designed to run on AWS's inf2 instances. The compilation was done following the instructions in this article: https://huggingface.co/docs/optimum-neuron/tutorials/llama2-13b-chatbot * License: LLAMA 2 COMMUNITY LICENSE] ## How to Use 1. Launch an **inf2.xlarge** instance on AWS EC2. As downloading the model requires about 50GB, it is recommended to set the storage size to 256GB or more. Please use the following AMI: **Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102** 2. Execute the following command to activate the provided Python environment. ```bash source /opt/aws_neuron_venv_pytorch/bin/activate ``` 3. Install **optimum**. ```bash pip install optimum[neuronx] ``` 4. Once the above steps are completed, execute the provided source code. ```python from optimum.neuron import NeuronModelForCausalLM from transformers import AutoTokenizer model_name = "watashiha/Watashiha-Llama-2-13B-Ogiri-sft-neuron" tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) model = NeuronModelForCausalLM.from_pretrained(model_name) odai = "What happens when a clock is hungry?" text = f""" Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request. Instructions: The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt. Input: {odai} Response: """ text = text.lstrip() token_ids = tokenizer.encode(text, return_tensors="pt") input_len = token_ids.shape[1] output_ids = model.generate( token_ids, max_length=input_len + 64, do_sample=True, top_p=0.9, top_k=50, temperature=0.8, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) output = tokenizer.decode(output_ids.tolist()[0], skip_special_tokens=True) print(output) """ Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request. Instructions: The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt. Input: {odai} Response: It takes time to get back on top! """ ``` ### Parameters for compilation #### input_shapes ``` { "batch_size": 1, "sequence_length": 1024, } ``` #### compiler_args ``` { "num_cores": 2, "auto_cast_type": 'bf16', } ```