--- language: - en pipeline_tag: text-generation inference: false tags: - facebook - meta - pytorch - llama - llama-2 - inferentia2 - neuron --- # Neuronx model for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). You can find detailed information about the base model on its [Model Card](https://huggingface.co/meta-llama/Llama-2-7b-hf). This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below. Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters. ## Usage on Amazon SageMaker _coming soon_ ## Usage with 🤗 `optimum-neuron` ```python >>> from optimum.neuron import pipeline >>> p = pipeline('text-generation', 'aws-neuron/Llama-2-7b-hf-neuron-latency') >>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50) [{'generated_text': 'My favorite place on earth is the ocean. It is where I feel most at peace. I love to travel and see new places. I have a'}] ``` This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints. ## Arguments passed during export **input_shapes** ```json { "batch_size": 1, "sequence_length": 2048, } ``` **compiler_args** ```json { "auto_cast_type": "fp16", "num_cores": 24, } ```