aws-neuron
/

Llama-2-7b-chat-hf-seqlen-2048-bs-4

Text Generation

text-generation-inference

Model card Files Files and versions Community

Llama-2-7b-chat-hf-seqlen-2048-bs-4 / README.md

philschmid's picture

philschmid HF staff

Update README.md

b633fcc 8 months ago

|

raw history blame contribute delete

No virus

2.38 kB

	---
	language:
	- en
	tags:
	- facebook
	- meta
	- pytorch
	- llama
	- llama-2
	- inferentia2
	- neuron
	extra_gated_heading: Access Llama 2 on Hugging Face
	extra_gated_description: This is a form to enable access to Llama 2 on Hugging Face
	after you have been granted access from Meta. Please visit the [Meta website](https://ai.meta.com/resources/models-and-libraries/llama-downloads)
	and accept our license terms and acceptable use policy before submitting this form.
	Requests will be processed in 1-2 days.
	extra_gated_prompt: '**Your Hugging Face account email address MUST match the email
	you provide on the Meta website, or your request will not be approved.**'
	extra_gated_button_content: Submit
	extra_gated_fields:
	? I agree to share my name, email address and username with Meta and confirm that
	I have already been granted download access on the Meta website
	: checkbox
	pipeline_tag: text-generation
	inference: false
	arxiv: 2307.09288
	---
	# Neuronx model for [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)

	This repository contains are [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoint for [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf). You can find detailed information about the base model on its [Model Card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).

	## Usage on Amazon SageMaker

	_coming soon_

	## Usage with optimum-neuron

	```python

	from optimum.neuron import pipeline

	# Load pipeline from Hugging Face repository
	pipe = pipeline("text-generation", "aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-4")

	# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
	messages = [
	{"role": "user", "content": "What is 2+2?"},
	]
	prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	# Run generation
	outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(outputs[0]["generated_text"])

	```

	## Compilation Arguments

	compilation arguments

	```json
	{
	"num_cores": 2,
	"auto_cast_type": "fp16"
	}
	```

	input_shapes

	```json
	{
	"sequence_length": 2048,
	"batch_size": 4
	}
	```