JRosenkranz commited on
Commit
7a6c813
1 Parent(s): 5666056

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -1
README.md CHANGED
@@ -33,7 +33,7 @@ Training is light-weight and can be completed in only a few days depending on ba
33
 
34
  _Note: For all samples, your environment must have access to cuda_
35
 
36
- ### Production Server Sample
37
 
38
  *To try this out running in a production-like environment, please use the pre-built docker image:*
39
 
@@ -101,6 +101,28 @@ python sample_client.py
101
 
102
  _Note: first prompt may be slower as there is a slight warmup time_
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ### Minimal Sample
105
 
106
  #### Install
 
33
 
34
  _Note: For all samples, your environment must have access to cuda_
35
 
36
+ ### Use in IBM Production TGIS
37
 
38
  *To try this out running in a production-like environment, please use the pre-built docker image:*
39
 
 
101
 
102
  _Note: first prompt may be slower as there is a slight warmup time_
103
 
104
+ ### Use in Huggingface TGI
105
+
106
+ #### start the server
107
+
108
+ ```bash
109
+ model=ibm-fms/llama3-8b-accelerator
110
+ volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
111
+
112
+ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model
113
+ ```
114
+
115
+ _note: for tensor parallel, add --num-shard_
116
+
117
+ #### make a request
118
+
119
+ ```bash
120
+ curl 127.0.0.1:8080/generate_stream \
121
+ -X POST \
122
+ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
123
+ -H 'Content-Type: application/json'
124
+ ```
125
+
126
  ### Minimal Sample
127
 
128
  #### Install