watashihakobashi's picture
Update README_en.md
ccdd91c verified
---
license: llama2
language:
- ja
- en
---
## Model Overview
This model is a compiled version of [Watashiha-Llama-2-13B-Ogiri-sft](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft) designed to run on AWS's inf2 instances.
The compilation was done following the instructions in this article:
https://huggingface.co/docs/optimum-neuron/tutorials/llama2-13b-chatbot
* License: LLAMA 2 COMMUNITY LICENSE]
## How to Use
1. Launch an **inf2.xlarge** instance on AWS EC2.
As downloading the model requires about 50GB, it is recommended to set the storage size to 256GB or more.
Please use the following AMI:
**Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102**
2. Execute the following command to activate the provided Python environment.
```bash
source /opt/aws_neuron_venv_pytorch/bin/activate
```
3. Install **optimum**.
```bash
pip install optimum[neuronx]
```
4. Once the above steps are completed, execute the provided source code.
```python
from optimum.neuron import NeuronModelForCausalLM
from transformers import AutoTokenizer
model_name = "watashiha/Watashiha-Llama-2-13B-Ogiri-sft-neuron"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = NeuronModelForCausalLM.from_pretrained(model_name)
odai = "What happens when a clock is hungry?"
text = f"""
Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.
Instructions:
The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.
Input:
{odai}
Response:
"""
text = text.lstrip()
token_ids = tokenizer.encode(text, return_tensors="pt")
input_len = token_ids.shape[1]
output_ids = model.generate(
token_ids,
max_length=input_len + 64,
do_sample=True,
top_p=0.9,
top_k=50,
temperature=0.8,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
output = tokenizer.decode(output_ids.tolist()[0], skip_special_tokens=True)
print(output)
"""
Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.
Instructions:
The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.
Input:
{odai}
Response:
It takes time to get back on top!
"""
```
### Parameters for compilation
#### input_shapes
```
{
"batch_size": 1,
"sequence_length": 1024,
}
```
#### compiler_args
```
{
"num_cores": 2,
"auto_cast_type": 'bf16',
}
```