watashihakobashi commited on
Commit
ccdd91c
1 Parent(s): 26a3ae0

Update README_en.md

Browse files
Files changed (1) hide show
  1. README_en.md +99 -0
README_en.md CHANGED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ language:
4
+ - ja
5
+ - en
6
+ ---
7
+
8
+ ## Model Overview
9
+ This model is a compiled version of [Watashiha-Llama-2-13B-Ogiri-sft](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft) designed to run on AWS's inf2 instances.
10
+
11
+ The compilation was done following the instructions in this article:
12
+ https://huggingface.co/docs/optimum-neuron/tutorials/llama2-13b-chatbot
13
+
14
+ * License: LLAMA 2 COMMUNITY LICENSE]
15
+
16
+ ## How to Use
17
+ 1. Launch an **inf2.xlarge** instance on AWS EC2.
18
+ As downloading the model requires about 50GB, it is recommended to set the storage size to 256GB or more.
19
+ Please use the following AMI:
20
+ **Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102**
21
+
22
+ 2. Execute the following command to activate the provided Python environment.
23
+ ```bash
24
+ source /opt/aws_neuron_venv_pytorch/bin/activate
25
+ ```
26
+
27
+ 3. Install **optimum**.
28
+ ```bash
29
+ pip install optimum[neuronx]
30
+ ```
31
+
32
+ 4. Once the above steps are completed, execute the provided source code.
33
+ ```python
34
+ from optimum.neuron import NeuronModelForCausalLM
35
+ from transformers import AutoTokenizer
36
+
37
+ model_name = "watashiha/Watashiha-Llama-2-13B-Ogiri-sft-neuron"
38
+ tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
39
+ model = NeuronModelForCausalLM.from_pretrained(model_name)
40
+
41
+ odai = "What happens when a clock is hungry?"
42
+ text = f"""
43
+ Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.
44
+
45
+ Instructions:
46
+ The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.
47
+
48
+ Input:
49
+ {odai}
50
+
51
+ Response:
52
+ """
53
+ text = text.lstrip()
54
+
55
+ token_ids = tokenizer.encode(text, return_tensors="pt")
56
+ input_len = token_ids.shape[1]
57
+ output_ids = model.generate(
58
+ token_ids,
59
+ max_length=input_len + 64,
60
+ do_sample=True,
61
+ top_p=0.9,
62
+ top_k=50,
63
+ temperature=0.8,
64
+ pad_token_id=tokenizer.pad_token_id,
65
+ eos_token_id=tokenizer.eos_token_id,
66
+ )
67
+ output = tokenizer.decode(output_ids.tolist()[0], skip_special_tokens=True)
68
+ print(output)
69
+ """
70
+ Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.
71
+
72
+ Instructions:
73
+ The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.
74
+
75
+ Input:
76
+ {odai}
77
+
78
+ Response:
79
+ It takes time to get back on top!
80
+ """
81
+ ```
82
+
83
+ ### Parameters for compilation
84
+
85
+ #### input_shapes
86
+ ```
87
+ {
88
+ "batch_size": 1,
89
+ "sequence_length": 1024,
90
+ }
91
+ ```
92
+
93
+ #### compiler_args
94
+ ```
95
+ {
96
+ "num_cores": 2,
97
+ "auto_cast_type": 'bf16',
98
+ }
99
+ ```