davidshtian
/

Mistral-7B-Instruct-v0.2-neuron-4x2048-24-cores-2.18

Text Generation

Model card Files Files and versions Community

davidshtian commited on Apr 2

Commit

7030a08

•

1 Parent(s): 8be51bf

Update README.md

Files changed (1) hide show

README.md +21 -4

README.md CHANGED Viewed

@@ -44,11 +44,28 @@ docker run -d -p 8080:80 \
        --max-batch-size 4 \
        --max-input-length 16 \
        --max-total-tokens 32
-curl 127.0.0.1:8080/generate \
-    -X POST \
-    -d '{"inputs":"Who are you?","parameters":{"max_new_tokens":16}}' \
-    -H 'Content-Type: application/json'
 ```
 ## Usage with 🤗 `optimum-neuron pipeline`

        --max-batch-size 4 \
        --max-input-length 16 \
        --max-total-tokens 32
+```
+There seems no support for sending list of prompts to server, refer to this [GitHub issue](https://github.com/huggingface/text-generation-inference/issues/1008).
+```python
+from huggingface_hub import InferenceClient
+import concurrent
+client = InferenceClient(model="http://127.0.0.1:8080")
+batch_text = ["1+1=", "2+2=", "3+3=", "4+4="]
+bs = 4
+def format_text_list(text_list):
+    return ['[INST] ' + text + ' [/INST]' for text in text_list]
+def gen_text(text):
+    return client.text_generation(text, max_new_tokens=16)
+with concurrent.futures.ThreadPoolExecutor(max_workers=bs) as executor:
+    out = list(executor.map(gen_text, format_text_list(batch_text)))
+print(out)
 ```
 ## Usage with 🤗 `optimum-neuron pipeline`