watashiha
/

Watashiha-Llama-2-13B-Ogiri-sft-neuron

Text Generation

Inference Endpoints

Model card Files Files and versions Community

watashihakobashi commited on Feb 2, 2024

Commit

ccdd91c

·

verified ·

1 Parent(s): 26a3ae0

Update README_en.md

Files changed (1) hide show

README_en.md +99 -0

README_en.md CHANGED Viewed

	@@ -0,0 +1,99 @@

+---
+license: llama2
+language:
+- ja
+- en
+---
+## Model Overview
+This model is a compiled version of [Watashiha-Llama-2-13B-Ogiri-sft](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft) designed to run on AWS's inf2 instances.
+The compilation was done following the instructions in this article:
+https://huggingface.co/docs/optimum-neuron/tutorials/llama2-13b-chatbot
+* License: LLAMA 2 COMMUNITY LICENSE]
+## How to Use
+1. Launch an **inf2.xlarge** instance on AWS EC2.
+   As downloading the model requires about 50GB, it is recommended to set the storage size to 256GB or more.
+   Please use the following AMI:
+   **Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102**
+2. Execute the following command to activate the provided Python environment.
+```bash
+source /opt/aws_neuron_venv_pytorch/bin/activate
+```
+3. Install **optimum**.
+```bash
+pip install optimum[neuronx]
+```
+4. Once the above steps are completed, execute the provided source code.
+```python
+from optimum.neuron import NeuronModelForCausalLM
+from transformers import AutoTokenizer
+model_name = "watashiha/Watashiha-Llama-2-13B-Ogiri-sft-neuron"
+tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
+model = NeuronModelForCausalLM.from_pretrained(model_name)
+odai = "What happens when a clock is hungry?"
+text = f"""
+Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.
+Instructions:
+The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.
+Input:
+{odai}
+Response:
+"""
+text = text.lstrip()
+token_ids = tokenizer.encode(text, return_tensors="pt")
+input_len = token_ids.shape[1]
+output_ids = model.generate(
+    token_ids,
+    max_length=input_len + 64,
+    do_sample=True,
+    top_p=0.9,
+    top_k=50,
+    temperature=0.8,
+    pad_token_id=tokenizer.pad_token_id,
+    eos_token_id=tokenizer.eos_token_id,
+)
+output = tokenizer.decode(output_ids.tolist()[0], skip_special_tokens=True)
+print(output)
+"""
+Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.
+Instructions:
+The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.
+Input:
+{odai}
+Response:
+It takes time to get back on top!
+"""
+```
+### Parameters for compilation
+#### input_shapes
+```
+{
+    "batch_size": 1,
+    "sequence_length": 1024,
+}
+```
+#### compiler_args
+```
+{
+    "num_cores": 2,
+    "auto_cast_type": 'bf16',
+}
+```