JRosenkranz commited on
Commit
ef36b0a
1 Parent(s): 78a73f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -33,7 +33,7 @@ Training is light-weight and can be completed in only a few days depending on ba
33
 
34
  _Note: For all samples, your environment must have access to cuda_
35
 
36
- ### Production Server Sample
37
 
38
  *To try this out running in a production-like environment, please use the pre-built docker image:*
39
 
@@ -101,6 +101,27 @@ python sample_client.py
101
 
102
  _Note: first prompt may be slower as there is a slight warmup time_
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ### Minimal Sample
105
 
106
  *To try this out with the fms-native compiled model, please execute the following:*
 
33
 
34
  _Note: For all samples, your environment must have access to cuda_
35
 
36
+ ### Use in IBM Production TGIS
37
 
38
  *To try this out running in a production-like environment, please use the pre-built docker image:*
39
 
 
101
 
102
  _Note: first prompt may be slower as there is a slight warmup time_
103
 
104
+ ### Use in Huggingface TGI
105
+
106
+ #### start the server
107
+
108
+ ```bash
109
+ model=ibm-fms/llama-13b-accelerator
110
+ volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
111
+ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model
112
+ ```
113
+
114
+ _note: for tensor parallel, add --num-shard_
115
+
116
+ #### make a request
117
+
118
+ ```bash
119
+ curl 127.0.0.1:8080/generate_stream \
120
+ -X POST \
121
+ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
122
+ -H 'Content-Type: application/json'
123
+ ```
124
+
125
  ### Minimal Sample
126
 
127
  *To try this out with the fms-native compiled model, please execute the following:*