aws-neuron
/

Mistral-7B-Instruct-v0.1-neuron-4x2048-24-cores

Text Generation

Model card Files Files and versions Community

dacorvo HF staff commited on Jan 23

Commit

6916e19

•

1 Parent(s): fe9153d

Create model card

Files changed (1) hide show

README.md +58 -0

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+language:
+- en
+pipeline_tag: text-generation
+inference: false
+tags:
+- facebook
+- meta
+- pytorch
+- mistral
+- inferentia2
+- neuron
+---
+# Neuronx model for [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
+This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf).
+You can find detailed information about the base model on its [Model Card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
+This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
+Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
+## Usage on Amazon SageMaker
+_coming soon_
+## Usage with 🤗 `optimum-neuron`
+```python
+>>> from optimum.neuron import pipeline
+>>> p = pipeline('text-generation', 'aws-neuron/Mistral-7B-Instruct-v0.1-neuron-4x2048-24-cores')
+>>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
+[{'generated_text': 'My favorite place on earth is the ocean. It is where I feel most
+at peace. I love to travel and see new places. I have a'}]
+```
+This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
+## Arguments passed during export
+**input_shapes**
+```json
+{
+  "batch_size": 4,
+  "sequence_length": 2048,
+}
+```
+**compiler_args**
+```json
+{
+  "auto_cast_type": "bf16",
+  "num_cores": 24,
+}
+```