davidshtian commited on
Commit
7030a08
1 Parent(s): 8be51bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -4
README.md CHANGED
@@ -44,11 +44,28 @@ docker run -d -p 8080:80 \
44
  --max-batch-size 4 \
45
  --max-input-length 16 \
46
  --max-total-tokens 32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
- curl 127.0.0.1:8080/generate \
49
- -X POST \
50
- -d '{"inputs":"Who are you?","parameters":{"max_new_tokens":16}}' \
51
- -H 'Content-Type: application/json'
52
  ```
53
 
54
  ## Usage with 🤗 `optimum-neuron pipeline`
 
44
  --max-batch-size 4 \
45
  --max-input-length 16 \
46
  --max-total-tokens 32
47
+ ```
48
+ There seems no support for sending list of prompts to server, refer to this [GitHub issue](https://github.com/huggingface/text-generation-inference/issues/1008).
49
+
50
+ ```python
51
+ from huggingface_hub import InferenceClient
52
+ import concurrent
53
+
54
+ client = InferenceClient(model="http://127.0.0.1:8080")
55
+ batch_text = ["1+1=", "2+2=", "3+3=", "4+4="]
56
+
57
+ bs = 4
58
+
59
+ def format_text_list(text_list):
60
+ return ['[INST] ' + text + ' [/INST]' for text in text_list]
61
+
62
+ def gen_text(text):
63
+ return client.text_generation(text, max_new_tokens=16)
64
+
65
+ with concurrent.futures.ThreadPoolExecutor(max_workers=bs) as executor:
66
+ out = list(executor.map(gen_text, format_text_list(batch_text)))
67
 
68
+ print(out)
 
 
 
69
  ```
70
 
71
  ## Usage with 🤗 `optimum-neuron pipeline`