watashiha
/

Watashiha-Llama-2-13B-Ogiri-sft-neuron

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Watashiha-Llama-2-13B-Ogiri-sft-neuron / README_en.md

watashihakobashi's picture

watashihakobashi

Update README_en.md

ccdd91c verified 4 months ago

|

raw history blame contribute delete

No virus

2.59 kB

	---
	license: llama2
	language:
	- ja
	- en
	---

	## Model Overview
	This model is a compiled version of [Watashiha-Llama-2-13B-Ogiri-sft](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft) designed to run on AWS's inf2 instances.

	The compilation was done following the instructions in this article:
	https://huggingface.co/docs/optimum-neuron/tutorials/llama2-13b-chatbot

	* License: LLAMA 2 COMMUNITY LICENSE]

	## How to Use
	1. Launch an inf2.xlarge instance on AWS EC2.
	As downloading the model requires about 50GB, it is recommended to set the storage size to 256GB or more.
	Please use the following AMI:
	Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102

	2. Execute the following command to activate the provided Python environment.
	```bash
	source /opt/aws_neuron_venv_pytorch/bin/activate
	```

	3. Install optimum.
	```bash
	pip install optimum[neuronx]
	```

	4. Once the above steps are completed, execute the provided source code.
	```python
	from optimum.neuron import NeuronModelForCausalLM
	from transformers import AutoTokenizer

	model_name = "watashiha/Watashiha-Llama-2-13B-Ogiri-sft-neuron"
	tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
	model = NeuronModelForCausalLM.from_pretrained(model_name)

	odai = "What happens when a clock is hungry?"
	text = f"""
	Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.

	Instructions:
	The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.

	Input:
	{odai}

	Response:
	"""
	text = text.lstrip()

	token_ids = tokenizer.encode(text, return_tensors="pt")
	input_len = token_ids.shape[1]
	output_ids = model.generate(
	token_ids,
	max_length=input_len + 64,
	do_sample=True,
	top_p=0.9,
	top_k=50,
	temperature=0.8,
	pad_token_id=tokenizer.pad_token_id,
	eos_token_id=tokenizer.eos_token_id,
	)
	output = tokenizer.decode(output_ids.tolist()[0], skip_special_tokens=True)
	print(output)
	"""
	Below is a combination of instructions explaining the task and contextually relevant input. Write a response that appropriately fulfills the request.

	Instructions:
	The input sentence is a prompt for a comedy skit. Generate a funny punchline that aligns with the prompt.

	Input:
	{odai}

	Response:
	It takes time to get back on top!
	"""
	```

	### Parameters for compilation

	#### input_shapes
	```
	{
	"batch_size": 1,
	"sequence_length": 1024,
	}
	```

	#### compiler_args
	```
	{
	"num_cores": 2,
	"auto_cast_type": 'bf16',
	}
	```