davidshtian commited on
Commit
89fc851
β€’
1 Parent(s): ee0bcdd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -5
README.md CHANGED
@@ -19,11 +19,12 @@ Please refer to the πŸ€— `optimum-neuron` [documentation](https://huggingface.co
19
  Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096.
20
 
21
  ## Usage with πŸ€— `TGI`
22
-
23
  ```shell
24
  export HF_TOKEN="hf_xxx"
25
 
26
  docker run -d -p 8080:80 \
 
27
  -v $(pwd)/data:/data \
28
  --device=/dev/neuron0 \
29
  -e HF_TOKEN=${HF_TOKEN} \
@@ -34,18 +35,53 @@ docker run -d -p 8080:80 \
34
  --max-total-tokens 32
35
  ```
36
 
37
- ## Usage with πŸ€— `optimum-neuron`
38
 
39
  ```python
40
- >>> from optimum.neuron import pipeline
 
 
 
41
 
42
- >>> p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18')
43
- >>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
44
  [{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there
45
  now I would take my partner on a romantic getaway where we could lay on the grass in the park,
46
  eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}]
47
  ```
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  This repository contains tags specific to versions of `neuronx`. When using with πŸ€— `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
50
 
51
  ## Arguments passed during export
 
19
  Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096.
20
 
21
  ## Usage with πŸ€— `TGI`
22
+ Refer to container image on [neuronx-tgi](https://gallery.ecr.aws/shtian/neuronx-tgi) Amazon ECR Public Gallery.
23
  ```shell
24
  export HF_TOKEN="hf_xxx"
25
 
26
  docker run -d -p 8080:80 \
27
+ --name mistral-7b-neuronx-tgi \
28
  -v $(pwd)/data:/data \
29
  --device=/dev/neuron0 \
30
  -e HF_TOKEN=${HF_TOKEN} \
 
35
  --max-total-tokens 32
36
  ```
37
 
38
+ ## Usage with πŸ€— `optimum-neuron pipeline`
39
 
40
  ```python
41
+ from optimum.neuron import pipeline
42
+
43
+ p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18')
44
+ p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
45
 
 
 
46
  [{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there
47
  now I would take my partner on a romantic getaway where we could lay on the grass in the park,
48
  eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}]
49
  ```
50
 
51
+ ## Usage with πŸ€— `optimum-neuron NeuronModelForCausalLM`
52
+
53
+ ```python
54
+ import torch
55
+ from transformers import AutoTokenizer
56
+ from optimum.neuron import NeuronModelForCausalLM
57
+
58
+ model = NeuronModelForCausalLM.from_pretrained("davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18")
59
+
60
+ tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
61
+ tokenizer.pad_token_id = tokenizer.eos_token_id
62
+
63
+ def model_sample(input_prompt):
64
+ input_prompt = "[INST] " + input_prompt + " [/INST]"
65
+
66
+ tokens = tokenizer(input_prompt, return_tensors="pt")
67
+
68
+ with torch.inference_mode():
69
+ sample_output = model.generate(
70
+ **tokens,
71
+ do_sample=True,
72
+ min_length=16,
73
+ max_length=32,
74
+ temperature=0.5,
75
+ pad_token_id=tokenizer.eos_token_id
76
+ )
77
+ outputs = [tokenizer.decode(tok, skip_special_tokens=True) for tok in sample_output]
78
+
79
+ res = outputs[0].split('[/INST]')[1].strip("</s>").strip()
80
+ return(res + "\n")
81
+
82
+ print(model_sample("how are you today?"))
83
+ ```
84
+
85
  This repository contains tags specific to versions of `neuronx`. When using with πŸ€— `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
86
 
87
  ## Arguments passed during export