JRosenkranz commited on
Commit
2532bba
1 Parent(s): 924f16b

updated docker run

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -7,6 +7,7 @@ To try this out running in a production-like environment, please use the pre-bui
7
  ```bash
8
  docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
docker run -d --rm --gpus all \
9
  --name my-tgis-server \
 
10
  -v /path/to/all/models:/models \
11
  -e MODEL_NAME=/models/model_weights/llama/13B-F \
12
  -e SPECULATOR_PATH=/models/speculator_weights/llama/13B-F \
@@ -15,8 +16,17 @@ docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-loca
15
  -e DTYPE_STR=float16 \
16
  docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
17
 
 
18
  docker logs my-tgis-server -f
19
- docker exec -it my-tgis-server python /path-to-example-code/sample_client.py
 
 
 
 
 
 
 
 
20
  ```
21
 
22
  To try this out with the fms-native compiled model, please execute the following:
 
7
  ```bash
8
  docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
docker run -d --rm --gpus all \
9
  --name my-tgis-server \
10
+ -p 8033:8033 \
11
  -v /path/to/all/models:/models \
12
  -e MODEL_NAME=/models/model_weights/llama/13B-F \
13
  -e SPECULATOR_PATH=/models/speculator_weights/llama/13B-F \
 
16
  -e DTYPE_STR=float16 \
17
  docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
18
 
19
+ # check logs and wait for "gRPC server started on port 8033" and "HTTP server started on port 3000"
20
  docker logs my-tgis-server -f
21
+
22
+ # get the client sample (Note: The first prompt will take longer as there is a warmup time)
23
+ conda create -n tgis-env python=3.11
24
+ conda activate tgis-env
25
+ git clone --branch speculative-decoding --single-branch https://github.com/tdoublep/text-generation-inference.git
26
+ cd text-generation-inference/integration_tests
27
+ make gen-client
28
+ pip install . --no-cache-dir
29
+ python sample_client.py
30
  ```
31
 
32
  To try this out with the fms-native compiled model, please execute the following: