File size: 2,594 Bytes
ccdd91c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: llama2
language:
- ja
- en
---

## Model Overview
This model is a compiled version of [Watashiha-Llama-2-13B-Ogiri-sft](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft) designed to run on AWS's inf2 instances.

The compilation was done following the instructions in this article:
https://huggingface.co/docs/optimum-neuron/tutorials/llama2-13b-chatbot

* License: LLAMA 2 COMMUNITY LICENSE]

## How to Use
1. Launch an **inf2.xlarge** instance on AWS EC2.
   As downloading the model requires about 50GB, it is recommended to set the storage size to 256GB or more.
   Please use the following AMI:
   **Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102**

2. Execute the following command to activate the provided Python environment.
```bash
source /opt/aws_neuron_venv_pytorch/bin/activate
```

3. Install **optimum**.
```bash
pip install optimum[neuronx]
```

4. Once the above steps are completed, execute the provided source code.
```python
from optimum.neuron import NeuronModelForCausalLM
from transformers import AutoTokenizer

model_name = "watashiha/Watashiha-Llama-2-13B-Ogiri-sft-neuron"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = NeuronModelForCausalLM.from_pretrained(model_name)

odai = "What happens when a clock is hungry?"
text = f"""
Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.

Instructions:
The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.

Input:
{odai}

Response:
"""
text = text.lstrip()

token_ids = tokenizer.encode(text, return_tensors="pt")
input_len = token_ids.shape[1]
output_ids = model.generate(
    token_ids,
    max_length=input_len + 64,
    do_sample=True,
    top_p=0.9,
    top_k=50,
    temperature=0.8,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)
output = tokenizer.decode(output_ids.tolist()[0], skip_special_tokens=True)
print(output)
"""
Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.

Instructions:
The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.

Input:
{odai}

Response:
It takes time to get back on top!
"""
```

### Parameters for compilation

#### input_shapes
```
{
    "batch_size": 1,
    "sequence_length": 1024,
}
```

#### compiler_args
```
{
    "num_cores": 2,
    "auto_cast_type": 'bf16',
}
```