davidshtian
commited on
Commit
•
7030a08
1
Parent(s):
8be51bf
Update README.md
Browse files
README.md
CHANGED
@@ -44,11 +44,28 @@ docker run -d -p 8080:80 \
|
|
44 |
--max-batch-size 4 \
|
45 |
--max-input-length 16 \
|
46 |
--max-total-tokens 32
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
-
|
49 |
-
-X POST \
|
50 |
-
-d '{"inputs":"Who are you?","parameters":{"max_new_tokens":16}}' \
|
51 |
-
-H 'Content-Type: application/json'
|
52 |
```
|
53 |
|
54 |
## Usage with 🤗 `optimum-neuron pipeline`
|
|
|
44 |
--max-batch-size 4 \
|
45 |
--max-input-length 16 \
|
46 |
--max-total-tokens 32
|
47 |
+
```
|
48 |
+
There seems no support for sending list of prompts to server, refer to this [GitHub issue](https://github.com/huggingface/text-generation-inference/issues/1008).
|
49 |
+
|
50 |
+
```python
|
51 |
+
from huggingface_hub import InferenceClient
|
52 |
+
import concurrent
|
53 |
+
|
54 |
+
client = InferenceClient(model="http://127.0.0.1:8080")
|
55 |
+
batch_text = ["1+1=", "2+2=", "3+3=", "4+4="]
|
56 |
+
|
57 |
+
bs = 4
|
58 |
+
|
59 |
+
def format_text_list(text_list):
|
60 |
+
return ['[INST] ' + text + ' [/INST]' for text in text_list]
|
61 |
+
|
62 |
+
def gen_text(text):
|
63 |
+
return client.text_generation(text, max_new_tokens=16)
|
64 |
+
|
65 |
+
with concurrent.futures.ThreadPoolExecutor(max_workers=bs) as executor:
|
66 |
+
out = list(executor.map(gen_text, format_text_list(batch_text)))
|
67 |
|
68 |
+
print(out)
|
|
|
|
|
|
|
69 |
```
|
70 |
|
71 |
## Usage with 🤗 `optimum-neuron pipeline`
|