JRosenkranz commited on
Commit
593ddda
1 Parent(s): a24b598

updated samples

Browse files
Files changed (1) hide show
  1. README.md +24 -11
README.md CHANGED
@@ -10,8 +10,13 @@ This model as intended to be used as an accelerator for llama 13B (chat).
10
  Undlerlying implementation of Paged Attention KV-Cached and speculator can be found in https://github.com/foundation-model-stack/fms-extras
11
  Production implementation using `fms-extras` implementation can be found in https://github.com/tdoublep/text-generation-inference/tree/speculative-decoding
12
 
 
13
 
14
- To try this out running in a production-like environment, please use the pre-built docker image:
 
 
 
 
15
 
16
  ```bash
17
  docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
docker run -d --rm --gpus all \
@@ -35,17 +40,31 @@ git clone --branch speculative-decoding --single-branch https://github.com/tdoub
35
  cd text-generation-inference/integration_tests
36
  make gen-client
37
  pip install . --no-cache-dir
 
 
 
 
 
38
  python sample_client.py
39
  ```
40
 
41
- To try this out with the fms-native compiled model, please execute the following:
 
 
42
 
43
- #### batch_size=1 (compile + cudagraphs)
44
 
45
  ```bash
46
  git clone https://github.com/foundation-model-stack/fms-extras
47
  (cd fms-extras && pip install -e .)
48
  pip install transformers==4.35.0 sentencepiece numpy
 
 
 
 
 
 
 
49
  python fms-extras/scripts/paged_speculative_inference.py \
50
  --variant=13b \
51
  --model_path=/path/to/model_weights/llama/13B-F \
@@ -57,12 +76,9 @@ python fms-extras/scripts/paged_speculative_inference.py \
57
  --compile_mode=reduce-overhead
58
  ```
59
 
60
- #### batch_size=1 (compile)
61
 
62
  ```bash
63
- git clone https://github.com/foundation-model-stack/fms-extras
64
- (cd fms-extras && pip install -e .)
65
- pip install transformers==4.35.0 sentencepiece numpy
66
  python fms-extras/scripts/paged_speculative_inference.py \
67
  --variant=13b \
68
  --model_path=/path/to/model_weights/llama/13B-F \
@@ -73,12 +89,9 @@ python fms-extras/scripts/paged_speculative_inference.py \
73
  --compile \
74
  ```
75
 
76
- #### batch_size=4 (compile)
77
 
78
  ```bash
79
- git clone https://github.com/foundation-model-stack/fms-extras
80
- (cd fms-extras && pip install -e .)
81
- pip install transformers==4.35.0 sentencepiece numpy
82
  python fms-extras/scripts/paged_speculative_inference.py \
83
  --variant=13b \
84
  --model_path=/path/to/model_weights/llama/13B-F \
 
10
  Undlerlying implementation of Paged Attention KV-Cached and speculator can be found in https://github.com/foundation-model-stack/fms-extras
11
  Production implementation using `fms-extras` implementation can be found in https://github.com/tdoublep/text-generation-inference/tree/speculative-decoding
12
 
13
+ ## Samples
14
 
15
+ ### Production Server Sample
16
+
17
+ *To try this out running in a production-like environment, please use the pre-built docker image:*
18
+
19
+ #### Setup
20
 
21
  ```bash
22
  docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
docker run -d --rm --gpus all \
 
40
  cd text-generation-inference/integration_tests
41
  make gen-client
42
  pip install . --no-cache-dir
43
+ ```
44
+
45
+ #### Run Sample
46
+
47
+ ```bash
48
  python sample_client.py
49
  ```
50
 
51
+ ### Minimal Sample
52
+
53
+ *To try this out with the fms-native compiled model, please execute the following:*
54
 
55
+ #### Install
56
 
57
  ```bash
58
  git clone https://github.com/foundation-model-stack/fms-extras
59
  (cd fms-extras && pip install -e .)
60
  pip install transformers==4.35.0 sentencepiece numpy
61
+ ```
62
+
63
+ #### Run Sample
64
+
65
+ ##### batch_size=1 (compile + cudagraphs)
66
+
67
+ ```bash
68
  python fms-extras/scripts/paged_speculative_inference.py \
69
  --variant=13b \
70
  --model_path=/path/to/model_weights/llama/13B-F \
 
76
  --compile_mode=reduce-overhead
77
  ```
78
 
79
+ ##### batch_size=1 (compile)
80
 
81
  ```bash
 
 
 
82
  python fms-extras/scripts/paged_speculative_inference.py \
83
  --variant=13b \
84
  --model_path=/path/to/model_weights/llama/13B-F \
 
89
  --compile \
90
  ```
91
 
92
+ ##### batch_size=4 (compile)
93
 
94
  ```bash
 
 
 
95
  python fms-extras/scripts/paged_speculative_inference.py \
96
  --variant=13b \
97
  --model_path=/path/to/model_weights/llama/13B-F \