davidshtian
/

Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18

@@ -19,11 +19,12 @@ Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co
 Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096.
 ## Usage with 🤗 `TGI`
 ```shell
 export HF_TOKEN="hf_xxx"
 docker run -d -p 8080:80 \
        -v $(pwd)/data:/data \
        --device=/dev/neuron0 \
        -e HF_TOKEN=${HF_TOKEN} \
@@ -34,18 +35,53 @@ docker run -d -p 8080:80 \
        --max-total-tokens 32
 ```
-## Usage with 🤗 `optimum-neuron`
 ```python
->>> from optimum.neuron import pipeline
->>> p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18')
->>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
 [{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there
 now I would take my partner on a romantic getaway where we could lay on the grass in the park,
 eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}]
 ```
 This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
 ## Arguments passed during export

 Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096.
 ## Usage with 🤗 `TGI`
+Refer to container image on [neuronx-tgi](https://gallery.ecr.aws/shtian/neuronx-tgi) Amazon ECR Public Gallery.
 ```shell
 export HF_TOKEN="hf_xxx"
 docker run -d -p 8080:80 \
+       --name mistral-7b-neuronx-tgi \
        -v $(pwd)/data:/data \
        --device=/dev/neuron0 \
        -e HF_TOKEN=${HF_TOKEN} \
        --max-total-tokens 32
 ```
+## Usage with 🤗 `optimum-neuron pipeline`
 ```python
+from optimum.neuron import pipeline
+p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18')
+p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
 [{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there
 now I would take my partner on a romantic getaway where we could lay on the grass in the park,
 eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}]
 ```
+## Usage with 🤗 `optimum-neuron NeuronModelForCausalLM`
+```python
+import torch
+from transformers import AutoTokenizer
+from optimum.neuron import NeuronModelForCausalLM
+model = NeuronModelForCausalLM.from_pretrained("davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18")
+tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
+tokenizer.pad_token_id = tokenizer.eos_token_id
+def model_sample(input_prompt):
+    input_prompt = "[INST] " + input_prompt + " [/INST]"
+    tokens = tokenizer(input_prompt, return_tensors="pt")
+    with torch.inference_mode():
+        sample_output = model.generate(
+            **tokens,
+            do_sample=True,
+            min_length=16,
+            max_length=32,
+            temperature=0.5,
+            pad_token_id=tokenizer.eos_token_id
+        )
+        outputs = [tokenizer.decode(tok, skip_special_tokens=True) for tok in sample_output]
+    res = outputs[0].split('[/INST]')[1].strip("</s>").strip()
+    return(res + "\n")
+print(model_sample("how are you today?"))
+```
 This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
 ## Arguments passed during export